Hello, I have some python dependencies including c...
# ask-the-community
f
Hello, I have some python dependencies including custom packages to use in my flyte workflow. I want to first try them out on a local flyte cluster. How do I install them on the local flytesnack or demo cluster ? e.g. I have the dependencies scikit-learn, xgboost and some custom packages in a local wheel file I can do pip install -r requirements.txt --index-url https://maven.homebox.com/repository/max-pypi-snapshots/simple --extra-index-url https://maven.homebox.com/repository/max-pypi-releases/simple --use-pep517
k
You have to add the python package in your dockerfile, and build a new image for your task. Here is an example. After building the image, you can use
pyflyte run --remote --image <image_name> wf
to run the workflow.
f
Hi @Kevin Su, I built successfully the image accordingly with my custom package in the requirements.txt, then I still got ModuleNotFoundError for my custom package when doing pyflyte run --remote --image <image_name> wf. pyflyte run --remote --image <image_name> wf will work only after I pip install my custom package locally. Is this also your experience?
k
could you show me your dockerfile
f
Copy code
FROM python:3.8-slim-buster

WORKDIR /root
ENV VENV /opt/venv
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONPATH /root

RUN apt-get update && apt-get install -y build-essential git gcc

RUN git clone <https://github.com/edenhill/librdkafka>
WORKDIR /root/librdkafka/
RUN ./configure
RUN make
RUN make install
RUN ldconfig
WORKDIR /root

# Install the AWS cli separately to prevent issues with boto being written over
RUN pip3 install awscli
# Similarly, if you're using GCP be sure to update this command to install gsutil
# RUN curl -sSL <https://sdk.cloud.google.com> | bash
# ENV PATH="$PATH:/root/google-cloud-sdk/bin"

ENV VENV /opt/venv
# Virtual environment
RUN python3 -m venv ${VENV}
ENV PATH="${VENV}/bin:$PATH"

# Install Python dependencies
COPY ./requirements.txt /root
RUN pip install --upgrade pip
RUN pip install -r /root/requirements.txt --index-url <https://maven.homebox.com/repository/max-pypi-snapshots/simple> --extra-index-url <https://maven.homebox.com/repository/max-pypi-releases/simple> --use-pep517

# Copy the actual code
COPY . /root

# This tag is supplied by the build script and will be used to determine the version
# when registering tasks, workflows, and launch plans
ARG tag
ENV FLYTE_INTERNAL_IMAGE $tag
It’s based on an older version flytesnacks/cookbook/Dockerfile
k
just to confirm, does requirements.txt have scikit-learn, xgboost and some custom packages? could I pull your image? I want to test it.
f
Yes. The image is in a private AWS ECR.
I have been able to use the image in flyte admin. It’s just I need to pip install the deps locally to run or register the wf.
If I uninstall the deps locally, it will say nomodulefound during the registration.
k
ah, I see. You have to install those package because flytekit will load entire file, and compile it. If you don’t have those package, flytekit will fail to compile it.
f
That makes sense totally. Thank you!
441 Views