Dear Flyte team, I'm looking at trying to use GPUs with the Sandbox and I see there is some existing...

rapid-artist-48509

03/03/2025, 11:34 PM

Dear Flyte team, I'm looking at trying to use GPUs with the Sandbox and I see there is some existing interest. Was hoping to drop-test an idea 🧵

rapid-artist-48509

03/03/2025, 11:34 PM

FWIW • I see this PR is in progress: https://github.com/flyteorg/flyte/pull/4340 • and search results like these are very helpful: https://discuss.flyte.org/t/13883438/hi-i-m-using-flyte-sandbox-and-trying-to-run-some-tasks-in-f

rapid-artist-48509

03/03/2025, 11:40 PM

I'm working if I can simply do the following to get a hacky solution: • start the sandbox container with gpus=all and/or just have my host's docker use nvidia-docker by default • use helm to install the nvidia device plugin ( https://github.com/NVIDIA/k8s-device-plugin#quick-start ) into the sandboxed k3s. in theory, now k8s should report GPUs • use a job that uses an ImageSpec and requests GPUs So the cited PR appears to more generally add GPU support to the sandbox (which would be amazing!). But I'm anticipating that I might not use the sandbox outside of a demo and thus a "hacky" solution could work. I think for a production cluster, even small, I could follow the Flyte docs, and I anticipate that I'd essentially be doing the above. I.e. start my own k8s / k3s cluster, then helm install the nvidia device plugin, then helm install flyte. And it looks like Flyte simply respects the nvidia device plugin labels. But right now, I had wanted to use the sandbox in more of a "demo" capacity.

rapid-artist-48509

03/03/2025, 11:41 PM

A similar question would be: how do folks run the mnist_classifier demo ( https://docs.flyte.org/en/latest/flytesnacks/examples/mnist_classifier/index.html ) in the sandbox? or is a more common test to run it on e.g. GKE ?

freezing-airport-6809

03/04/2025, 3:52 AM

You can run it on union serverless

freezing-airport-6809

03/04/2025, 5:51 AM

signup.union.ai its a test and demo environment

freezing-airport-6809

03/04/2025, 7:11 PM

Or else we can deploy to your cloud - a union environment. I can do it for free if we move quick 😄

10 Views

Open in Slack

Previous Next

Flyte

Flyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.