New RunLLM response generated >>> Question I am running the Flyte #runllm-flyte-trial

New RunLLM response generated: >>>*# Question* I ...

jolly-florist-66728

10/22/2024, 9:26 PM

New RunLLM response generated: >>>*# Question* I am running the PyTorch Lightning MNIST example (https://docs.flyte.org/en/latest/flytesnacks/examples/kfpytorch_plugin/pytorch_lightning_mnist_autoencoder.html) on my homelab computer. First it was complaining that I should not use cuda="12.1.0", so I replaced it with conda_channels=["nvidia"] and now I could start the execution of the workflow on my cluster but it does not seem to use the GPU at all nvidia-smi shows 0% volatile usage. I expected the fan to go crazy. Should I not use Elastic with only one GPU? How can I check what the job is doing right now? # Answer To ensure GPU utilization, verify that the task requests GPU resources in Flyte. Use the

Resources

parameter in the task decorator to specify GPU requirements. For example,

@task(requests=Resources(gpu="1"))

. Check if the GPU is correctly configured and accessible by the container. Use

nvidia-smi

inside the container to confirm GPU visibility. If using Elastic, ensure it's configured for single GPU use. Elastic is typically for distributed training, but it can be used with a single GPU. Check logs for task execution details. Would you like more detail? Sources: - Flyte PyTorch Example - Flyte GPU Configuration Discussion Slack Link https://.slack.com/archives/CP2HDHKE1/p1729632356.807919 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

Open in Slack

Previous Next