<@U06RKAMK0UC> I have multiple tasks in my workflo...
# ask-ai
s
@ancient-wolf-19325 I have multiple tasks in my workflow. Only the last task needs GPU, everything else does not. All tasks also share the same image. So, I only set GPU requirements to the last task, everything else has no GPU. When I run my workflow,
torch.cuda.is_available()
is false on the final task. Why does this happen? To further test, I modified all my tasks to have GPU. When I do this, my final task does have
torch.cuda.is_available()
is true. This does not make sense to me as each task is suppose to run on it's own container
j
This issue is likely due to how Flyte schedules pods based on resource requests: if only the last task requests a GPU, only that task's pod will be scheduled on a node with GPU(s). However, if your container image is not built with the correct CUDA drivers or is not running on a node with GPU access,
torch.cuda.is_available()
will return
False
. When you set GPU requirements for all tasks, all pods are scheduled on GPU nodes, so the environment is correct for CUDA detection in every task, including the last one. This suggests your image and cluster are set up correctly, but the last task only gets a GPU if its pod is scheduled on a GPU node, which only happens if you request a GPU for that task specifically. Make sure your final task's decorator includes
requests=Resources(gpu="1")
(and optionally
limits=Resources(gpu="1")
), and that your Flyte cluster has available GPU nodes. Also, ensure your image is built with CUDA support. If these are set, only the task(s) with GPU requests will run on GPU nodes, and
torch.cuda.is_available()
should be
True
only in those tasks. See Flyte GPU configuration docs and task resource settings for more details. Would you like a step-by-step breakdown or example code? Sources: - Configuring Access to GPUs - task resource settings 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
s
@ancient-wolf-19325 The problem happens when I request GPU for the last one only. Rest does not have GPU
j
If only your last task requests a GPU and the rest do not, but
torch.cuda.is_available()
is still
False
in the last task, the most likely cause is that the pod for the last task is not being scheduled on a GPU node. This can happen if your cluster does not have available GPU nodes, or if the Flyte/Kubernetes configuration for GPU scheduling is not correct. Each task runs in its own pod, so only the task with
requests=Resources(gpu="1")
should be scheduled on a GPU node, and only that pod should have access to the GPU and CUDA. If all tasks request GPU, then all pods are scheduled on GPU nodes, so
torch.cuda.is_available()
is
True
everywhere. But if only the last task requests GPU and it still cannot see the GPU, check that your cluster has available GPU nodes, the node labels and taints are set up correctly, and your Flyte deployment is configured to schedule GPU tasks properly. See Configuring Access to GPUs for troubleshooting. Would you like more detail on how to verify your cluster and Flyte configuration? Sources: - Configuring Access to GPUs 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.