Hi,
I’m using flyte sandbox and trying to run some tasks in flyte.
I have a task that needs to access gpu from my host machine. Since, I have gpu is in my host machine, it needs to be passed on to the k8 pod that’s running inside the docker container (sandbox). However, I see that kubectl is not able to schedule the pod.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 11m default-scheduler 0/1 nodes are available: 1 Insufficient <http://nvidia.com/gpu|nvidia.com/gpu>, 1 node(s) didn't match Pod's node affinity/selector. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
I also checked the
kubectl describe node 31e2e2ba6f9f
(
31e2e2ba6f9f
is the flyte-sandbox container in which flyte is running in sandbox environment)
Looks like node (docker container in this case) itself doesn’t have gpu to allocate to the pod
>kubectl describe node 31e2e2ba6f9f
...
Capacity:
cpu: 8 ephemeral-storage: 944801904Ki
hugepages-1Gi: 0 hugepages-2Mi: 0
memory: 65774996Ki
pods: 110
Allocatable:
cpu: 8
ephemeral-storage: 919103291491 hugepages-1Gi: 0
hugepages-2Mi: 0 memory: 65774996Ki
pods: 110
...
I think this is happening because the flyte-sandbox docker container is not able to access gpus. I logged into the container and it doesn’t give output of
nvidia-smi
which is required to work for container to access gpus.
❯ docker exec -it 31e2e2ba6f9f /bin/sh
/ # nvidia-smi
/bin/sh: nvidia-smi: not found
/ #
Note that outside of flyte, I’m able to run the task image in a docker container in my host machine as below which needs --gpus all
to be passed for container to access gpus
docker run --gpus all <task_image_with_gpu_req>
I think if we can somehow pass the
--gpus all
to the docker run command for sandbox during
flytectl demo start
command, it should work.
Please help !!!