:wave: Anybody running tasks in GPUs? I am trying...
# ask-the-community
j
👋 Anybody running tasks in GPUs? I am trying this simple example:
Copy code
from flytekit import Resources, task, workflow, ImageSpec

import torch

custom_image = ImageSpec(
  name="flyte",
  cuda="11.2.2",
  cudnn="8",
  python_version="3.9", 
  packages=[
    "torch==2.0.1"
  ],
  registry="<http://878877078763.dkr.ecr.us-east-1.amazonaws.com|878877078763.dkr.ecr.us-east-1.amazonaws.com>",
)

@task(container_image=custom_image, requests=Resources(cpu="1", mem="1Gi", gpu="1"), limits=Resources(cpu="1", mem="1Gi", gpu="1"))
def run_gpu_task():
    cuda_available = torch.cuda.is_available()
    print(f"Is cuda available? {cuda_available}")
    if cuda_available:
        print('__CUDNN VERSION:', torch.backends.cudnn.version())
        print('__Number CUDA Devices:', torch.cuda.device_count())
        print('__CUDA Device Name:',torch.cuda.get_device_name(0))
        print('__CUDA Device Total Memory [GB]:',torch.cuda.get_device_properties(0).total_memory/1e9)

    return 


@workflow
def gpu_example():
    run_gpu_task()
The task manage to spin up a GPU node but I get an error:
Copy code
==========
== CUDA ==
==========

CUDA Version 11.2.2

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
<https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license>

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

/opt/nvidia/nvidia_entrypoint.sh: line 67: exec: pyflyte-fast-execute: not found
Which sounds like some dependencies for Flyte are missing on the image built above?
n
Hi Jose, have you tried
docker run -it
into the image and checking if flytekit is installed? If you specify
cuda
in the image spec then
ubuntu20.04
is used as the base image (see here), not the default base image shipped with flytekit.
In that case you’ll need to add
flytekit
in the packages arg
j
Sounds like it is what is happening, I will try! thanks
Worked:
Copy code
from flytekit import Resources, task, workflow, ImageSpec

import torch

custom_image = ImageSpec(
  name="flyte",
  cuda="11.2.2",
  cudnn="8",
  python_version="3.9", 
  packages=[
    "flytekit==1.9.1",
    "torch==2.0.1" 
  ],
  registry="...",
)

@task(
  container_image=custom_image,
  requests=Resources(cpu="1", mem="1Gi", gpu="1"),
  limits=Resources(cpu="1", mem="1Gi", gpu="1")
)
def run_gpu_task():
    cuda_available = torch.cuda.is_available()
    print(f"Is cuda available? {cuda_available}")
    if cuda_available:
        print('__CUDNN VERSION:', torch.backends.cudnn.version())
        print('__Number CUDA Devices:', torch.cuda.device_count())
        print('__CUDA Device Name:',torch.cuda.get_device_name(0))
        print('__CUDA Device Total Memory [GB]:',torch.cuda.get_device_properties(0).total_memory/1e9)

    return 


@workflow
def gpu_example():
    run_gpu_task()
n
How can we make this experience somewhat more intuitive? E.g: • More magic: we inject
flytekit
into the
packages
argument by default? • Less magic: when
cuda
is not None, raise a warning that
flytekit
won’t be installed in the base image and you need to specify it in
packages
@Kevin Su @Eduardo Apolinario (eapolinario)
j
As a user, I would rather have the first option. It would be the same experience as with building a regular image for cpu.
e
one question for the "more magic" approach: which version of flytekit do we install?
k
I think we can use the same version of flytekit they are using locally.
n
I think we can use the same version of flytekit they are using locally.
I think as long as we
<http://log.info|log.info>
this at build time it’ll be transparent