prehistoric-kangaroo-27449
09/26/2024, 1:59 PMtorch.cuda.is_available()
returns False
• However, if I kubectl exec /bin/bash
into the same pod, torch.cuda.is_available()
would return True
What could be possible reasons as to why that could be the case?
I am using a custom docker image for registering tasks/workflows, image is based on python:3.10-slim
and has flytekit-1.13.5
installed into itfreezing-airport-6809
prehistoric-kangaroo-27449
09/26/2024, 2:20 PMprint("torch_location:", torch.__file__)
print("python:", sys.executable)
prehistoric-kangaroo-27449
09/26/2024, 2:58 PMfreezing-airport-6809
prehistoric-kangaroo-27449
09/26/2024, 3:00 PMfreezing-airport-6809
prehistoric-kangaroo-27449
09/26/2024, 3:00 PMprehistoric-kangaroo-27449
09/26/2024, 3:01 PMfreezing-airport-6809
glamorous-carpet-83516
09/26/2024, 5:13 PMI am using a custom docker image for registering tasks/workflowsare you able to share the dockerfile
prehistoric-kangaroo-27449
09/26/2024, 5:46 PMFROM <some_private_dr> as inventory
FROM python:3.10-slim
# --- Set up google cloud, python etc.
COPY --from=inventory /inventory/scripts /inventory/scripts
RUN bash /inventory/scripts/base_python.sh && \
bash /inventory/scripts/google_cloud_sdk.sh && \
bash /inventory/scripts/kubectl.sh
# Install lib dependencies
ADD pyproject.toml pyproject.toml
ADD poetry.lock poetry.lock
RUN ~/.profile && \
poetry install --no-interaction --no-ansi --no-root && \
poetry -n cache clear --all .
WORKDIR /workspace
ENV VENV /opt/venv
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONPATH /workspace
ADD module_code/ module_code
# This tag is supplied by the build script and will be used to determine the version
# when registering tasks, workflows, and launch plans
ARG tag
ENV FLYTE_INTERNAL_IMAGE $tag
Note that poetry.lock
contains this , which installs cuda dependencies
[package.dependencies]
nvidia-cublas-cu11 = {version = "11.10.3.66", markers = "platform_system == \"Linux\""}
nvidia-cuda-nvrtc-cu11 = {version = "11.7.99", markers = "platform_system == \"Linux\""}
nvidia-cuda-runtime-cu11 = {version = "11.7.99", markers = "platform_system == \"Linux\""}
nvidia-cudnn-cu11 = {version = "8.5.0.96", markers = "platform_system == \"Linux\""}