Flyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.

Flyte

I am running the PyTorch Lightning MNIST example (<https://docs.flyte.org/en/latest/flytesnacks/examples/kfpytorch_plugin/pytorch_lightning_mnist_autoencoder.html>) on my homelab computer. First it was complaining that I should not use cuda="12.1.0", so I replaced it with conda_channels=["nvidia"] and now I could start the execution of the workflow on my cluster but it does not seem to use the GPU at all nvidia-smi shows 0% volatile usage. I expected the fan to go crazy. Should I not use Elastic with only one GPU? How can I check what the job is doing right now?

You can use the elastic with one GPU. Usually the problem is the driver and the version of library mismatch.

<@U07S5N6R05D> what does your `ImageSpec` look like?

It’s important to check if the built image has the correct cuda drivers that matches your pytorch version: test with `torch.cuda.is_available()`

You can do `pyflyte build &lt;script&gt;.py &lt;workflow/task_name&gt;` to build the image locally

also, what does your `Elastic` task config look like?