Dear Flyte team, I'm looking at trying to use GPUs with the Sandbox and I see there is some existing...
r
Dear Flyte team, I'm looking at trying to use GPUs with the Sandbox and I see there is some existing interest. Was hoping to drop-test an idea 🧵
FWIW • I see this PR is in progress: https://github.com/flyteorg/flyte/pull/4340 • and search results like these are very helpful: https://discuss.flyte.org/t/13883438/hi-i-m-using-flyte-sandbox-and-trying-to-run-some-tasks-in-f
I'm working if I can simply do the following to get a hacky solution: • start the sandbox container with gpus=all and/or just have my host's docker use nvidia-docker by default • use helm to install the nvidia device plugin ( https://github.com/NVIDIA/k8s-device-plugin#quick-start ) into the sandboxed k3s. in theory, now k8s should report GPUs • use a job that uses an ImageSpec and requests GPUs So the cited PR appears to more generally add GPU support to the sandbox (which would be amazing!). But I'm anticipating that I might not use the sandbox outside of a demo and thus a "hacky" solution could work. I think for a production cluster, even small, I could follow the Flyte docs, and I anticipate that I'd essentially be doing the above. I.e. start my own k8s / k3s cluster, then helm install the nvidia device plugin, then helm install flyte. And it looks like Flyte simply respects the nvidia device plugin labels. But right now, I had wanted to use the sandbox in more of a "demo" capacity.
A similar question would be: how do folks run the mnist_classifier demo ( https://docs.flyte.org/en/latest/flytesnacks/examples/mnist_classifier/index.html ) in the sandbox? or is a more common test to run it on e.g. GKE ?
f
You can run it on union serverless
signup.union.ai its a test and demo environment
Or else we can deploy to your cloud - a union environment. I can do it for free if we move quick 😄