< ancient wolf 19325> I have multiple tasks in my workflow O Flyte #ask-ai

<@U06RKAMK0UC> I have multiple tasks in my workflo...

square-agency-59624

06/17/2025, 1:45 PM

@ancient-wolf-19325 I have multiple tasks in my workflow. Only the last task needs GPU, everything else does not. All tasks also share the same image. So, I only set GPU requirements to the last task, everything else has no GPU. When I run my workflow,

torch.cuda.is_available()

is false on the final task. Why does this happen? To further test, I modified all my tasks to have GPU. When I do this, my final task does have

torch.cuda.is_available()

is true. This does not make sense to me as each task is suppose to run on it's own container

jolly-florist-66728

06/17/2025, 1:45 PM

This issue is likely due to how Flyte schedules pods based on resource requests: if only the last task requests a GPU, only that task's pod will be scheduled on a node with GPU(s). However, if your container image is not built with the correct CUDA drivers or is not running on a node with GPU access,

torch.cuda.is_available()

will return

False

. When you set GPU requirements for all tasks, all pods are scheduled on GPU nodes, so the environment is correct for CUDA detection in every task, including the last one. This suggests your image and cluster are set up correctly, but the last task only gets a GPU if its pod is scheduled on a GPU node, which only happens if you request a GPU for that task specifically. Make sure your final task's decorator includes

requests=Resources(gpu="1")

(and optionally

limits=Resources(gpu="1")

), and that your Flyte cluster has available GPU nodes. Also, ensure your image is built with CUDA support. If these are set, only the task(s) with GPU requests will run on GPU nodes, and

torch.cuda.is_available()

should be

True

only in those tasks. See Flyte GPU configuration docs and task resource settings for more details. Would you like a step-by-step breakdown or example code? Sources: - Configuring Access to GPUs - task resource settings 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

square-agency-59624

06/17/2025, 2:40 PM

@ancient-wolf-19325 The problem happens when I request GPU for the last one only. Rest does not have GPU

jolly-florist-66728

06/17/2025, 2:40 PM

If only your last task requests a GPU and the rest do not, but

torch.cuda.is_available()

is still

False

in the last task, the most likely cause is that the pod for the last task is not being scheduled on a GPU node. This can happen if your cluster does not have available GPU nodes, or if the Flyte/Kubernetes configuration for GPU scheduling is not correct. Each task runs in its own pod, so only the task with

requests=Resources(gpu="1")

should be scheduled on a GPU node, and only that pod should have access to the GPU and CUDA. If all tasks request GPU, then all pods are scheduled on GPU nodes, so

torch.cuda.is_available()

True

everywhere. But if only the last task requests GPU and it still cannot see the GPU, check that your cluster has available GPU nodes, the node labels and taints are set up correctly, and your Flyte deployment is configured to schedule GPU tasks properly. See Configuring Access to GPUs for troubleshooting. Would you like more detail on how to verify your cluster and Flyte configuration? Sources: - Configuring Access to GPUs 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

Open in Slack

Previous Next