Hello ^_^ Before trying this out myself, has there...
# ask-the-community
b
Hello ^_^ Before trying this out myself, has there been any attempts/thoughts on enabling nvidia-acceleration for the single container demo cluster?
k
No, there has been no attempts. If you happen to do it please share 🤯
b
Ok, got it working finally... based the sandbox on
nvidia/cuda:11.3.0-runtime-ubuntu20.04
instead of
rancher/k3s:v1.24.4-k3s1
and installed k3s and crictl during docker build... haven't worked with docker buildx before, so not sure how to beautify it ^_^ will see if I can clean it up a little bit further
The sandbox-cuda container comes in at a hefty 3.7GB, but will come in handy for our engineers I hope. Really nice sandbox you got there! 🙂
k
Cc @jeev / @Yee
@Björn happy to host it with the core Flyte sandbox
And also add flytectl demo start — gpu?
j
Neat. We could also possibly consider a “multi-node” setup where one of the nodes has GPU drivers and can be fired up in an opt-in way. GPU workloads can then be configured to schedule on this node via the usual affinity.
b
How would you prefer to have it delivered? As a PR or just as files that you can fit into the build system according to your wishes? If PR, should I add it as a build target for the Makefile in sandbox-bundled or make a new sandbox directory you think?
j
i’m thinking probably a build target in sandbox-bundled. if you can open a draft PR, we can discuss further. Perhaps we can reorganize the current default stage to be based on some version of ubuntu that would make layering GPU drivers+cuda on easier too.
b
Will do!
Ack, realised I forgot to add the k3s config for the local repo 😕
ah well, let's see what you think about the PR first ^_^
k
cc @jeev / @Yee will need your expertise
b
To run the image you need to have an nvidia-enabled docker... I did this by installing (in ubuntu 20.04)
nvidia-docker2
and use
/etc/docker/daemon.json
like so:
Copy code
{
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}
and the restart docker
also, during docker run you need to pass
--gpus all
as an arg
k
we will need to run on a cloud instance with gpu 🙂
j
are y’all running this on developer machines @Björn?
just curious if users get gpu-enabled dev machines, run their own hardware, or get a gpu-enabled VM in the cloud.
also some additional related docs: https://k3d.io/v5.4.6/usage/advanced/cuda/
b
@jeev During development I have been running this on a workstation at home... Hopefully our data scientists can run this either or their gpu enabled workstations or on a vertex ai workbench in GCP with attached T4 GPU
Saw some referenced to local k3d development as well, but not until I submitted the PR ^_^ I think the demo setup is really handy
@jeev I put some links in the PR from where I found info... This one was a nice one: https://itnext.io/enabling-nvidia-gpus-on-k3s-for-cuda-workloads-a11b96f967b0
j
awesome thanks. will take a look tomorrow 🙂
128 Views