elegant-parrot-47406
02/06/2025, 6:17 PMflyte-binary
and specify the following configuration:
configuration:
inline:
plugins:
k8s:
resource-tolerations:
- <http://nvidia.com/gpu|nvidia.com/gpu>:
- key: "mykey"
operator: "Equal"
value: "myvalue"
effect: "NoSchedule"
Additionally, we specify the `nodeSelector`:
configuration:
inline:
plugins:
k8s:
gpu-device-node-label: "<http://cloud.google.com/gke-accelerator|cloud.google.com/gke-accelerator>"
Moreover, we set default-node-selector
to another node pool (spot) to run other CPU-focused workloads.
The issue we're facing is that the pod requesting a GPU seems to get matched with the correct node gpu-device-node-label
, but it results in an error stating that the label of gpu-device-node-label
does not match default-node-selector
.
Additionally, when inspecting the pod requesting a GPU with kubectl
, I notice that the Node-Selectors
field still includes default-node-selector
.
Can someone help me with this?average-finland-92144
02/06/2025, 8:28 PMkubectl describe
of a task Pod?
As a side note I think gpu-device-node-label
should be called gpu-device-node-selector
to reduce confusion
So @elegant-parrot-47406 is the expected behavior that if a task requests a GPU, ONLY the GPU node selector is injected and for those that don't requests GPUs they ONLY get the default node selector?
Do you also get errors in tasks with no gpus?elegant-parrot-47406
02/07/2025, 8:29 AMgpu-device-node-label
• The behavior of pods not requesting GPUS is ok. It works as expected and the get the default node selector. So all good here
• The expected behaviour for pods requesting GPUs should be: only the GPU node selector is injectedelegant-parrot-47406
02/07/2025, 8:31 AMkubectl describe
would tell something like
nodeSelector:
default-node-selector
.....
tolerations:
- effect: ....
average-finland-92144
02/10/2025, 6:58 PM• According to the documentation itsOh sure, it's more like me complaining that Flyte should name it like what it really is: a selector, the label is on the nodes, but maybe just a semantics issue. The behavior of thegpu-device-node-label
default-node-selector
being injected to all Pods is expected.average-finland-92144
02/10/2025, 7:00 PMinterrutible=True
in the task decorator. To better control scheduling you can set interruptible-node-selector to match the labels and conditions that your spot instances have configuredaverage-finland-92144
02/10/2025, 7:01 PMaverage-finland-92144
02/10/2025, 7:01 PMFlyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.
Powered by