https://flyte.org logo
#ask-the-community
Title
# ask-the-community
b

Björn

01/20/2023, 1:14 PM
Hello ^_^ Before trying this out myself, has there been any attempts/thoughts on enabling nvidia-acceleration for the single container demo cluster?
k

Ketan (kumare3)

01/20/2023, 2:26 PM
No, there has been no attempts. If you happen to do it please share 🤯
b

Björn

01/22/2023, 3:24 PM
Ok, got it working finally... based the sandbox on
nvidia/cuda:11.3.0-runtime-ubuntu20.04
instead of
rancher/k3s:v1.24.4-k3s1
and installed k3s and crictl during docker build... haven't worked with docker buildx before, so not sure how to beautify it ^_^ will see if I can clean it up a little bit further
The sandbox-cuda container comes in at a hefty 3.7GB, but will come in handy for our engineers I hope. Really nice sandbox you got there! 🙂
k

Ketan (kumare3)

01/22/2023, 4:02 PM
Cc @jeev / @Yee
@Björn happy to host it with the core Flyte sandbox
And also add flytectl demo start — gpu?
j

jeev

01/22/2023, 4:13 PM
Neat. We could also possibly consider a “multi-node” setup where one of the nodes has GPU drivers and can be fired up in an opt-in way. GPU workloads can then be configured to schedule on this node via the usual affinity.
b

Björn

01/22/2023, 5:48 PM
How would you prefer to have it delivered? As a PR or just as files that you can fit into the build system according to your wishes? If PR, should I add it as a build target for the Makefile in sandbox-bundled or make a new sandbox directory you think?
j

jeev

01/22/2023, 6:13 PM
i’m thinking probably a build target in sandbox-bundled. if you can open a draft PR, we can discuss further. Perhaps we can reorganize the current default stage to be based on some version of ubuntu that would make layering GPU drivers+cuda on easier too.
b

Björn

01/22/2023, 6:14 PM
Will do!
Ack, realised I forgot to add the k3s config for the local repo 😕
ah well, let's see what you think about the PR first ^_^
k

Ketan (kumare3)

01/22/2023, 9:29 PM
cc @jeev / @Yee will need your expertise
b

Björn

01/22/2023, 9:36 PM
To run the image you need to have an nvidia-enabled docker... I did this by installing (in ubuntu 20.04)
nvidia-docker2
and use
/etc/docker/daemon.json
like so:
Copy code
{
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}
and the restart docker
also, during docker run you need to pass
--gpus all
as an arg
k

Ketan (kumare3)

01/22/2023, 9:43 PM
we will need to run on a cloud instance with gpu 🙂
j

jeev

01/22/2023, 11:14 PM
are y’all running this on developer machines @Björn?
just curious if users get gpu-enabled dev machines, run their own hardware, or get a gpu-enabled VM in the cloud.
also some additional related docs: https://k3d.io/v5.4.6/usage/advanced/cuda/
b

Björn

01/23/2023, 6:08 AM
@jeev During development I have been running this on a workstation at home... Hopefully our data scientists can run this either or their gpu enabled workstations or on a vertex ai workbench in GCP with attached T4 GPU
Saw some referenced to local k3d development as well, but not until I submitted the PR ^_^ I think the demo setup is really handy
@jeev I put some links in the PR from where I found info... This one was a nice one: https://itnext.io/enabling-nvidia-gpus-on-k3s-for-cuda-workloads-a11b96f967b0
j

jeev

01/23/2023, 6:14 AM
awesome thanks. will take a look tomorrow 🙂
9 Views