fancy-hamburger-89099
06/18/2024, 7:47 AMconfiguration:
inline:
task_resources:
defaults:
cpu: 500m
memory: 1Gi
<http://nvidia.com/gpu|nvidia.com/gpu>: "1"
limits:
cpu: 2
memory: 2Gi
<http://nvidia.com/gpu|nvidia.com/gpu>: "1"
plugins:
k8s:
inject-finalizer: true
default-memory: 200Gi
default-cpus: "20"
resource-tolerations:
- gpu:
- key: "gpu"
operator: "Equal"
value: "true"
effect: "NoSchedule"
gpu-resource-name: "<http://nvidia.com/gpu|nvidia.com/gpu>"
default-node-selector:
poolname: gpu
We can see it's present in the main Flyte pod config file but the only thing that is being applied to the task pod is the nodeSelector
nodeSelector:
poolname: gpu
I really appreciate any help, thank you!average-finland-92144
06/18/2024, 3:16 PMaverage-finland-92144
06/18/2024, 3:18 PMresource-tolerations
should match an ExtendedResource advertised by the device driver plugin on your K8s nodes. For NVIDIA accelerators, it's typically <http://nvidia.com/gpu|nvidia.com/gpu>
instead of just gpu
average-finland-92144
06/18/2024, 3:22 PMgpu-resource-name
is, I haven't needed it and have been able to use GPUs on tainted nodes in Azureaverage-finland-92144
06/18/2024, 3:37 PMgpu
to <http://nvidia.com/gpu|nvidia.com/gpu>
under resource-tolerations
Also bear in mind that even if you don't set this list, flytepropeller should inject a <http://nvidia.com/gpu:NoSchedule|nvidia.com/gpu:NoSchedule>
toleration when you request a GPU device. The list is useful if your GPU nodes have additional taints