handsome-sandwich-69169
10/31/2023, 3:04 PMfrom flytekit import Resources, task, workflow, ImageSpec
import torch
custom_image = ImageSpec(
name="flyte",
cuda="11.2.2",
cudnn="8",
python_version="3.9",
packages=[
"torch==2.0.1"
],
registry="<http://878877078763.dkr.ecr.us-east-1.amazonaws.com|878877078763.dkr.ecr.us-east-1.amazonaws.com>",
)
@task(container_image=custom_image, requests=Resources(cpu="1", mem="1Gi", gpu="1"), limits=Resources(cpu="1", mem="1Gi", gpu="1"))
def run_gpu_task():
cuda_available = torch.cuda.is_available()
print(f"Is cuda available? {cuda_available}")
if cuda_available:
print('__CUDNN VERSION:', torch.backends.cudnn.version())
print('__Number CUDA Devices:', torch.cuda.device_count())
print('__CUDA Device Name:',torch.cuda.get_device_name(0))
print('__CUDA Device Total Memory [GB]:',torch.cuda.get_device_properties(0).total_memory/1e9)
return
@workflow
def gpu_example():
run_gpu_task()
The task manage to spin up a GPU node but I get an error:
==========
== CUDA ==
==========
CUDA Version 11.2.2
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
<https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license>
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
/opt/nvidia/nvidia_entrypoint.sh: line 67: exec: pyflyte-fast-execute: not found
Which sounds like some dependencies for Flyte are missing on the image built above?broad-monitor-993
10/31/2023, 3:47 PMdocker run -it
into the image and checking if flytekit is installed? If you specify cuda
in the image spec then ubuntu20.04
is used as the base image (see here), not the default base image shipped with flytekit.broad-monitor-993
10/31/2023, 3:47 PMflytekit
in the packages arghandsome-sandwich-69169
10/31/2023, 3:48 PMhandsome-sandwich-69169
11/01/2023, 9:39 AMfrom flytekit import Resources, task, workflow, ImageSpec
import torch
custom_image = ImageSpec(
name="flyte",
cuda="11.2.2",
cudnn="8",
python_version="3.9",
packages=[
"flytekit==1.9.1",
"torch==2.0.1"
],
registry="...",
)
@task(
container_image=custom_image,
requests=Resources(cpu="1", mem="1Gi", gpu="1"),
limits=Resources(cpu="1", mem="1Gi", gpu="1")
)
def run_gpu_task():
cuda_available = torch.cuda.is_available()
print(f"Is cuda available? {cuda_available}")
if cuda_available:
print('__CUDNN VERSION:', torch.backends.cudnn.version())
print('__Number CUDA Devices:', torch.cuda.device_count())
print('__CUDA Device Name:',torch.cuda.get_device_name(0))
print('__CUDA Device Total Memory [GB]:',torch.cuda.get_device_properties(0).total_memory/1e9)
return
@workflow
def gpu_example():
run_gpu_task()
broad-monitor-993
11/01/2023, 12:46 PMflytekit
into the packages
argument by default?
• Less magic: when cuda
is not None, raise a warning that flytekit
won’t be installed in the base image and you need to specify it in packages
@glamorous-carpet-83516 @high-accountant-32689handsome-sandwich-69169
11/01/2023, 12:49 PMhigh-accountant-32689
11/01/2023, 6:04 PMglamorous-carpet-83516
11/03/2023, 6:33 PMbroad-monitor-993
11/06/2023, 4:17 PMI think we can use the same version of flytekit they are using locally.I think as long as we
<http://log.info|log.info>
this at build time it’ll be transparent