I need to introduce a GPU node for some tasks in a workflow. I've read this introduction and understand kubernetes taints & tolerations, but don't understand the components of this yaml.
12/15/2023 3:53:22 PM UTC task submitted to K8s
12/15/2023 3:53:22 PM UTC Unschedulable:0/5 nodes are available: 1 Insufficient memory, 1 node(s) had untolerated taint {dedicated: flyte}, 2 Insufficient cpu, 4 Insufficient <http://nvidia.com/gpu|nvidia.com/gpu>. preemption: 0/5 nodes are available: 1 Preemption is not helpful for scheduling, 4 No preemption victims found for incoming pod..
t
tall-lock-23197
12/18/2023, 5:36 AM
where are you setting this config? are you using the flyte binary helm chart? cc @average-finland-92144
a
average-finland-92144
12/18/2023, 12:59 PM
@echoing-carpenter-92090 from the config you share, it looks like you have 2 taints set on the node but only one matching toleration. For the
key
you can use
"<http://nvidia.com/gpu|nvidia.com/gpu>"
which, BTW, I think is the resource type exposed by the NVIDIA driver, so the key is restrictive here.
Also, is that new node also tainted with