elegant-parrot-47406
02/06/2025, 6:17 PMflyte-binary and specify the following configuration:
configuration:
inline:
plugins:
k8s:
resource-tolerations:
- <http://nvidia.com/gpu|nvidia.com/gpu>:
- key: "mykey"
operator: "Equal"
value: "myvalue"
effect: "NoSchedule"
Additionally, we specify the `nodeSelector`:
configuration:
inline:
plugins:
k8s:
gpu-device-node-label: "<http://cloud.google.com/gke-accelerator|cloud.google.com/gke-accelerator>"
Moreover, we set default-node-selector to another node pool (spot) to run other CPU-focused workloads.
The issue we're facing is that the pod requesting a GPU seems to get matched with the correct node gpu-device-node-label, but it results in an error stating that the label of gpu-device-node-label does not match default-node-selector.
Additionally, when inspecting the pod requesting a GPU with kubectl, I notice that the Node-Selectors field still includes default-node-selector.
Can someone help me with this?average-finland-92144
02/06/2025, 8:28 PMkubectl describe of a task Pod?
As a side note I think gpu-device-node-label should be called gpu-device-node-selector to reduce confusion
So @elegant-parrot-47406 is the expected behavior that if a task requests a GPU, ONLY the GPU node selector is injected and for those that don't requests GPUs they ONLY get the default node selector?
Do you also get errors in tasks with no gpus?elegant-parrot-47406
02/07/2025, 8:29 AMgpu-device-node-label
• The behavior of pods not requesting GPUS is ok. It works as expected and the get the default node selector. So all good here
• The expected behaviour for pods requesting GPUs should be: only the GPU node selector is injectedelegant-parrot-47406
02/07/2025, 8:31 AMkubectl describe would tell something like
nodeSelector:
default-node-selector
.....
tolerations:
- effect: ....average-finland-92144
02/10/2025, 6:58 PM• According to the documentation itsOh sure, it's more like me complaining that Flyte should name it like what it really is: a selector, the label is on the nodes, but maybe just a semantics issue. The behavior of thegpu-device-node-label
default-node-selector being injected to all Pods is expected.average-finland-92144
02/10/2025, 7:00 PMinterrutible=True in the task decorator. To better control scheduling you can set interruptible-node-selector to match the labels and conditions that your spot instances have configuredaverage-finland-92144
02/10/2025, 7:01 PMaverage-finland-92144
02/10/2025, 7:01 PM