I'm working if I can simply do the following to get a hacky solution:
• start the sandbox container with gpus=all and/or just have my host's docker use nvidia-docker by default
• use helm to install the nvidia device plugin ( https://github.com/NVIDIA/k8s-device-plugin#quick-start ) into the sandboxed k3s. in theory, now k8s should report GPUs
• use a job that uses an ImageSpec and requests GPUs
So the cited PR appears to more generally add GPU support to the sandbox (which would be amazing!). But I'm anticipating that I might not use the sandbox outside of a demo and thus a "hacky" solution could work.
I think for a production cluster, even small, I could follow the Flyte docs, and I anticipate that I'd essentially be doing the above. I.e. start my own k8s / k3s cluster, then helm install the nvidia device plugin, then helm install flyte. And it looks like Flyte simply respects the nvidia device plugin labels.
But right now, I had wanted to use the sandbox in more of a "demo" capacity.