Hello ^ ^ Before trying this out myself has there been any a Flyte #flyte-support

Hello ^_^ Before trying this out myself, has there...

quick-salesclerk-18019

01/20/2023, 1:14 PM

Hello ^_^ Before trying this out myself, has there been any attempts/thoughts on enabling nvidia-acceleration for the single container demo cluster?

freezing-airport-6809

01/20/2023, 2:26 PM

No, there has been no attempts. If you happen to do it please share 🤯

👍 1

quick-salesclerk-18019

01/22/2023, 3:24 PM

Ok, got it working finally... based the sandbox on

nvidia/cuda:11.3.0-runtime-ubuntu20.04

instead of

rancher/k3s:v1.24.4-k3s1

and installed k3s and crictl during docker build... haven't worked with docker buildx before, so not sure how to beautify it ^_^ will see if I can clean it up a little bit further

❤️ 1

quick-salesclerk-18019

01/22/2023, 3:28 PM

The sandbox-cuda container comes in at a hefty 3.7GB, but will come in handy for our engineers I hope. Really nice sandbox you got there! 🙂

freezing-airport-6809

01/22/2023, 4:02 PM

Cc @freezing-boots-56761 / @thankful-minister-83577

freezing-airport-6809

01/22/2023, 4:02 PM

@quick-salesclerk-18019 happy to host it with the core Flyte sandbox

freezing-airport-6809

01/22/2023, 4:03 PM

And also add flytectl demo start — gpu?

❤️ 1

freezing-boots-56761

01/22/2023, 4:13 PM

Neat. We could also possibly consider a “multi-node” setup where one of the nodes has GPU drivers and can be fired up in an opt-in way. GPU workloads can then be configured to schedule on this node via the usual affinity.

❤️ 1

quick-salesclerk-18019

01/22/2023, 5:48 PM

How would you prefer to have it delivered? As a PR or just as files that you can fit into the build system according to your wishes? If PR, should I add it as a build target for the Makefile in sandbox-bundled or make a new sandbox directory you think?

freezing-boots-56761

01/22/2023, 6:13 PM

i’m thinking probably a build target in sandbox-bundled. if you can open a draft PR, we can discuss further. Perhaps we can reorganize the current default stage to be based on some version of ubuntu that would make layering GPU drivers+cuda on easier too.

👍 2

quick-salesclerk-18019

01/22/2023, 6:14 PM

Will do!

quick-salesclerk-18019

01/22/2023, 8:49 PM

PR is here: https://github.com/flyteorg/flyte/pull/3256

quick-salesclerk-18019

01/22/2023, 9:12 PM

Ack, realised I forgot to add the k3s config for the local repo 😕

quick-salesclerk-18019

01/22/2023, 9:14 PM

ah well, let's see what you think about the PR first ^_^

freezing-airport-6809

01/22/2023, 9:29 PM

cc @freezing-boots-56761 / @thankful-minister-83577 will need your expertise

quick-salesclerk-18019

01/22/2023, 9:36 PM

To run the image you need to have an nvidia-enabled docker... I did this by installing (in ubuntu 20.04)

nvidia-docker2

and use

/etc/docker/daemon.json

like so:

Copy code

{
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}

and the restart docker

quick-salesclerk-18019

01/22/2023, 9:37 PM

more info: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html

quick-salesclerk-18019

01/22/2023, 9:38 PM

also, during docker run you need to pass

--gpus all

as an arg

freezing-airport-6809

01/22/2023, 9:43 PM

we will need to run on a cloud instance with gpu 🙂

👍 1

freezing-boots-56761

01/22/2023, 11:14 PM

are y’all running this on developer machines @quick-salesclerk-18019?

freezing-boots-56761

01/22/2023, 11:16 PM

just curious if users get gpu-enabled dev machines, run their own hardware, or get a gpu-enabled VM in the cloud.

freezing-boots-56761

01/22/2023, 11:20 PM

also some additional related docs: https://k3d.io/v5.4.6/usage/advanced/cuda/

quick-salesclerk-18019

01/23/2023, 6:08 AM

@freezing-boots-56761 During development I have been running this on a workstation at home... Hopefully our data scientists can run this either or their gpu enabled workstations or on a vertex ai workbench in GCP with attached T4 GPU

quick-salesclerk-18019

01/23/2023, 6:09 AM

Saw some referenced to local k3d development as well, but not until I submitted the PR ^_^ I think the demo setup is really handy

quick-salesclerk-18019

01/23/2023, 6:10 AM

@freezing-boots-56761 I put some links in the PR from where I found info... This one was a nice one: https://itnext.io/enabling-nvidia-gpus-on-k3s-for-cuda-workloads-a11b96f967b0

freezing-boots-56761

01/23/2023, 6:14 AM

awesome thanks. will take a look tomorrow 🙂

👍 1

208 Views

Open in Slack

Previous Next