wave Anybody running tasks in GPUs I am trying this simple Flyte #flyte-support

:wave: Anybody running tasks in GPUs? I am trying...

handsome-sandwich-69169

10/31/2023, 3:04 PM

👋 Anybody running tasks in GPUs? I am trying this simple example:

Copy code

from flytekit import Resources, task, workflow, ImageSpec

import torch

custom_image = ImageSpec(
  name="flyte",
  cuda="11.2.2",
  cudnn="8",
  python_version="3.9", 
  packages=[
    "torch==2.0.1"
  ],
  registry="<http://878877078763.dkr.ecr.us-east-1.amazonaws.com|878877078763.dkr.ecr.us-east-1.amazonaws.com>",
)

@task(container_image=custom_image, requests=Resources(cpu="1", mem="1Gi", gpu="1"), limits=Resources(cpu="1", mem="1Gi", gpu="1"))
def run_gpu_task():
    cuda_available = torch.cuda.is_available()
    print(f"Is cuda available? {cuda_available}")
    if cuda_available:
        print('__CUDNN VERSION:', torch.backends.cudnn.version())
        print('__Number CUDA Devices:', torch.cuda.device_count())
        print('__CUDA Device Name:',torch.cuda.get_device_name(0))
        print('__CUDA Device Total Memory [GB]:',torch.cuda.get_device_properties(0).total_memory/1e9)

    return 


@workflow
def gpu_example():
    run_gpu_task()

The task manage to spin up a GPU node but I get an error:

Copy code

==========
== CUDA ==
==========

CUDA Version 11.2.2

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
<https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license>

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

/opt/nvidia/nvidia_entrypoint.sh: line 67: exec: pyflyte-fast-execute: not found

Which sounds like some dependencies for Flyte are missing on the image built above?

broad-monitor-993

10/31/2023, 3:47 PM

Hi Jose, have you tried

docker run -it

into the image and checking if flytekit is installed? If you specify

cuda

in the image spec then

ubuntu20.04

is used as the base image (see here), not the default base image shipped with flytekit.

👀 1

broad-monitor-993

10/31/2023, 3:47 PM

In that case you’ll need to add

flytekit

in the packages arg

handsome-sandwich-69169

10/31/2023, 3:48 PM

Sounds like it is what is happening, I will try! thanks

handsome-sandwich-69169

11/01/2023, 9:39 AM

Worked:

Copy code

from flytekit import Resources, task, workflow, ImageSpec

import torch

custom_image = ImageSpec(
  name="flyte",
  cuda="11.2.2",
  cudnn="8",
  python_version="3.9", 
  packages=[
    "flytekit==1.9.1",
    "torch==2.0.1" 
  ],
  registry="...",
)

@task(
  container_image=custom_image,
  requests=Resources(cpu="1", mem="1Gi", gpu="1"),
  limits=Resources(cpu="1", mem="1Gi", gpu="1")
)
def run_gpu_task():
    cuda_available = torch.cuda.is_available()
    print(f"Is cuda available? {cuda_available}")
    if cuda_available:
        print('__CUDNN VERSION:', torch.backends.cudnn.version())
        print('__Number CUDA Devices:', torch.cuda.device_count())
        print('__CUDA Device Name:',torch.cuda.get_device_name(0))
        print('__CUDA Device Total Memory [GB]:',torch.cuda.get_device_properties(0).total_memory/1e9)

    return 


@workflow
def gpu_example():
    run_gpu_task()

🦜 1

broad-monitor-993

11/01/2023, 12:46 PM

How can we make this experience somewhat more intuitive? E.g: • More magic: we inject

flytekit

into the

packages

argument by default? • Less magic: when

cuda

is not None, raise a warning that

flytekit

won’t be installed in the base image and you need to specify it in

packages

@glamorous-carpet-83516 @high-accountant-32689

handsome-sandwich-69169

11/01/2023, 12:49 PM

As a user, I would rather have the first option. It would be the same experience as with building a regular image for cpu.

high-accountant-32689

11/01/2023, 6:04 PM

one question for the "more magic" approach: which version of flytekit do we install?

glamorous-carpet-83516

11/03/2023, 6:33 PM

I think we can use the same version of flytekit they are using locally.

broad-monitor-993

11/06/2023, 4:17 PM

I think we can use the same version of flytekit they are using locally.

I think as long as we

<http://log.info|log.info>

this at build time it’ll be transparent

👍 2

3 Views

Open in Slack

Previous Next