Hello ^_^ Before trying this out myself, has there...
# flyte-support
q
Hello ^_^ Before trying this out myself, has there been any attempts/thoughts on enabling nvidia-acceleration for the single container demo cluster?
f
No, there has been no attempts. If you happen to do it please share 🤯
👍 1
q
Ok, got it working finally... based the sandbox on
nvidia/cuda:11.3.0-runtime-ubuntu20.04
instead of
rancher/k3s:v1.24.4-k3s1
and installed k3s and crictl during docker build... haven't worked with docker buildx before, so not sure how to beautify it ^_^ will see if I can clean it up a little bit further
❤️ 1
The sandbox-cuda container comes in at a hefty 3.7GB, but will come in handy for our engineers I hope. Really nice sandbox you got there! 🙂
f
Cc @freezing-boots-56761 / @thankful-minister-83577
@quick-salesclerk-18019 happy to host it with the core Flyte sandbox
And also add flytectl demo start — gpu?
❤️ 1
f
Neat. We could also possibly consider a “multi-node” setup where one of the nodes has GPU drivers and can be fired up in an opt-in way. GPU workloads can then be configured to schedule on this node via the usual affinity.
❤️ 1
q
How would you prefer to have it delivered? As a PR or just as files that you can fit into the build system according to your wishes? If PR, should I add it as a build target for the Makefile in sandbox-bundled or make a new sandbox directory you think?
f
i’m thinking probably a build target in sandbox-bundled. if you can open a draft PR, we can discuss further. Perhaps we can reorganize the current default stage to be based on some version of ubuntu that would make layering GPU drivers+cuda on easier too.
👍 2
q
Will do!
Ack, realised I forgot to add the k3s config for the local repo 😕
ah well, let's see what you think about the PR first ^_^
f
cc @freezing-boots-56761 / @thankful-minister-83577 will need your expertise
q
To run the image you need to have an nvidia-enabled docker... I did this by installing (in ubuntu 20.04)
nvidia-docker2
and use
/etc/docker/daemon.json
like so:
Copy code
{
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}
and the restart docker
also, during docker run you need to pass
--gpus all
as an arg
f
we will need to run on a cloud instance with gpu 🙂
👍 1
f
are y’all running this on developer machines @quick-salesclerk-18019?
just curious if users get gpu-enabled dev machines, run their own hardware, or get a gpu-enabled VM in the cloud.
also some additional related docs: https://k3d.io/v5.4.6/usage/advanced/cuda/
q
@freezing-boots-56761 During development I have been running this on a workstation at home... Hopefully our data scientists can run this either or their gpu enabled workstations or on a vertex ai workbench in GCP with attached T4 GPU
Saw some referenced to local k3d development as well, but not until I submitted the PR ^_^ I think the demo setup is really handy
@freezing-boots-56761 I put some links in the PR from where I found info... This one was a nice one: https://itnext.io/enabling-nvidia-gpus-on-k3s-for-cuda-workloads-a11b96f967b0
f
awesome thanks. will take a look tomorrow 🙂
👍 1
207 Views