Losing my nerves over this for the whole day, mayb...
# flyte-support
p
Losing my nerves over this for the whole day, maybe someone here has an idea: • I can't get GPU available within the flyte task, i.e.
torch.cuda.is_available()
returns False • However, if I
kubectl exec /bin/bash
into the same pod,
torch.cuda.is_available()
would return True What could be possible reasons as to why that could be the case? I am using a custom docker image for registering tasks/workflows, image is based on
python:3.10-slim
and has
flytekit-1.13.5
installed into it
f
Different torch versions and mismatch with cuda?
p
I checked python binary and torch location and it's the same in both cases
Copy code
print("torch_location:", torch.__file__)
    print("python:", sys.executable)
well anyway ive used a pytorch-cuda base image instead, and it got flyte to pick up CUDA still i find it strange that i was able to use a GPU to train a model in a same image, just outside flyte
f
It has to be difference in the library loading path
p
i swear, it was not
f
Ohh I believe you I am just saying how this can manifest
p
right, was going after the same thing
my coworker proposed to check the user executing and the permissions related to it, could also be
f
Hmm that’s interesting
g
I am using a custom docker image for registering tasks/workflows
are you able to share the dockerfile
p
@worried-pager-82302 sure, here is a somewhat anonymized version of it
Copy code
FROM <some_private_dr> as inventory
FROM python:3.10-slim


# --- Set up google cloud, python etc.
COPY --from=inventory /inventory/scripts /inventory/scripts
RUN bash /inventory/scripts/base_python.sh && \
    bash /inventory/scripts/google_cloud_sdk.sh && \
    bash /inventory/scripts/kubectl.sh

# Install lib dependencies
ADD pyproject.toml pyproject.toml
ADD poetry.lock poetry.lock
RUN ~/.profile && \
    poetry install --no-interaction --no-ansi --no-root && \
    poetry -n cache clear --all .

WORKDIR /workspace
ENV VENV /opt/venv
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONPATH /workspace

ADD module_code/ module_code

# This tag is supplied by the build script and will be used to determine the version
# when registering tasks, workflows, and launch plans
ARG tag
ENV FLYTE_INTERNAL_IMAGE $tag
Note that
poetry.lock
contains this , which installs cuda dependencies
Copy code
[package.dependencies]
nvidia-cublas-cu11 = {version = "11.10.3.66", markers = "platform_system == \"Linux\""}
nvidia-cuda-nvrtc-cu11 = {version = "11.7.99", markers = "platform_system == \"Linux\""}
nvidia-cuda-runtime-cu11 = {version = "11.7.99", markers = "platform_system == \"Linux\""}
nvidia-cudnn-cu11 = {version = "8.5.0.96", markers = "platform_system == \"Linux\""}