handsome-sandwich-69169
10/31/2023, 3:04 PMfrom flytekit import Resources, task, workflow, ImageSpec
import torch
custom_image = ImageSpec(
name="flyte",
cuda="11.2.2",
cudnn="8",
python_version="3.9",
packages=[
"torch==2.0.1"
],
registry="<http://878877078763.dkr.ecr.us-east-1.amazonaws.com|878877078763.dkr.ecr.us-east-1.amazonaws.com>",
)
@task(container_image=custom_image, requests=Resources(cpu="1", mem="1Gi", gpu="1"), limits=Resources(cpu="1", mem="1Gi", gpu="1"))
def run_gpu_task():
cuda_available = torch.cuda.is_available()
print(f"Is cuda available? {cuda_available}")
if cuda_available:
print('__CUDNN VERSION:', torch.backends.cudnn.version())
print('__Number CUDA Devices:', torch.cuda.device_count())
print('__CUDA Device Name:',torch.cuda.get_device_name(0))
print('__CUDA Device Total Memory [GB]:',torch.cuda.get_device_properties(0).total_memory/1e9)
return
@workflow
def gpu_example():
run_gpu_task()
The task manage to spin up a GPU node but I get an error:
==========
== CUDA ==
==========
CUDA Version 11.2.2
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
<https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license>
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
/opt/nvidia/nvidia_entrypoint.sh: line 67: exec: pyflyte-fast-execute: not found
Which sounds like some dependencies for Flyte are missing on the image built above?broad-monitor-993
10/31/2023, 3:47 PMdocker run -it into the image and checking if flytekit is installed? If you specify cuda in the image spec then ubuntu20.04 is used as the base image (see here), not the default base image shipped with flytekit.broad-monitor-993
10/31/2023, 3:47 PMflytekit in the packages arghandsome-sandwich-69169
10/31/2023, 3:48 PMhandsome-sandwich-69169
11/01/2023, 9:39 AMfrom flytekit import Resources, task, workflow, ImageSpec
import torch
custom_image = ImageSpec(
name="flyte",
cuda="11.2.2",
cudnn="8",
python_version="3.9",
packages=[
"flytekit==1.9.1",
"torch==2.0.1"
],
registry="...",
)
@task(
container_image=custom_image,
requests=Resources(cpu="1", mem="1Gi", gpu="1"),
limits=Resources(cpu="1", mem="1Gi", gpu="1")
)
def run_gpu_task():
cuda_available = torch.cuda.is_available()
print(f"Is cuda available? {cuda_available}")
if cuda_available:
print('__CUDNN VERSION:', torch.backends.cudnn.version())
print('__Number CUDA Devices:', torch.cuda.device_count())
print('__CUDA Device Name:',torch.cuda.get_device_name(0))
print('__CUDA Device Total Memory [GB]:',torch.cuda.get_device_properties(0).total_memory/1e9)
return
@workflow
def gpu_example():
run_gpu_task()broad-monitor-993
11/01/2023, 12:46 PMflytekit into the packages argument by default?
• Less magic: when cuda is not None, raise a warning that flytekit won’t be installed in the base image and you need to specify it in packages
@glamorous-carpet-83516 @high-accountant-32689handsome-sandwich-69169
11/01/2023, 12:49 PMhigh-accountant-32689
11/01/2023, 6:04 PMglamorous-carpet-83516
11/03/2023, 6:33 PMbroad-monitor-993
11/06/2023, 4:17 PMI think we can use the same version of flytekit they are using locally.I think as long as we
<http://log.info|log.info> this at build time it’ll be transparent