Hi, I've setup a flyte sandbox cluster in an EC2 i...
# ask-the-community
k
Hi, I've setup a flyte sandbox cluster in an EC2 instance. And created a custom docker image with the required dependencies installed in it. But when I try to run any script, I am getting
ModuleNotFoundError
, though that module was installed when creating the docker image. And this is getting resolved when I
pip install
that module in the instance. So, please clarify whether on running a workflow, would flyte propeller look for the dependencies in the specified docker container or it tries to use the dependencies from the server it is running in ?
s
This should work, @KS Tarun. Could you share with us the Dockerfile you’re using and the module that’s missing?
k
Docker File:
Copy code
from ubuntu:focal

WORKDIR /root
ENV VENV /opt/venv
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONPATH /root

RUN : \
    && apt-get update \
    && apt install -y software-properties-common \
    && add-apt-repository ppa:deadsnakes/ppa

RUN : \
    && apt-get update \
    && apt-get install -y python3.8 python3-pip python3-venv make build-essential libssl-dev curl vim

RUN apt-get -y update
RUN apt-get -y install git

# This is necessary for opencv to work
RUN apt-get update && apt-get install -y libsm6 libxext6 libxrender-dev ffmpeg

# Install the AWS cli separately to prevent issues with boto being written over
RUN pip3 install awscli

WORKDIR /opt
RUN curl <https://sdk.cloud.google.com> > install.sh
RUN bash /opt/install.sh --install-dir=/opt
ENV PATH $PATH:/opt/google-cloud-sdk/bin
WORKDIR /root

# Virtual environment
ENV VENV /opt/venv
RUN python3 -m venv ${VENV}
ENV PATH="${VENV}/bin:$PATH"

# Install Python dependencies
COPY requirements.txt /root
RUN ${VENV}/bin/pip install -r /root/requirements.txt

# Copy the makefile targets to expose on the container. This makes it easier to register.
# COPY <http://in_container.mk|in_container.mk> /root/Makefile
COPY sandbox.config /root

# Copy over the helper script that the SDK relies on
RUN cp ${VENV}/bin/flytekit_venv /usr/local/bin/
RUN chmod a+x /usr/local/bin/flytekit_venv

# This tag is supplied by the build script and will be used to determine the version
# when registering tasks, workflows, and launch plans
ARG tag
ENV FLYTE_INTERNAL_IMAGE $tag
requirements.txt:
Copy code
flytekit
awscli
pymysql
scikit-learn
xgboost
sqlalchemy
s
What’s the module that’s missing?
k
xgboost, sqlalchemy and any library which is not installed in the instance, but required for the script to run, is giving
ModuleNotFoundError
s
A silly question: are you using the custom docker image while registering your workflows? Can you double check that?
k
Yes, I'm using it.
s
Can you run a container using the custom docker image and check if those libraries are available?
k
Ok, I'll check.
@Samhita Alla All the required libraries are available in the custom docker image.
s
This is weird. It should work. Can you share with me the commands you ran to register your workflows? Also, aren’t you pushing your image to a registry? It seems like you’re using a locally available image.
k
To register workflow:
pyflyte --config ~/.flyte/config-sandbox.yaml register --image fibo:1.0 Test1.py
The image was created inside the flyte-sandbox.
s
@Kevin Su, any idea why this might be happening?
@KS Tarun, can you try pushing the image to github registry or ECR and use that as the image? Not sure if that’ll fix the problem, but you can give that a try to get unblocked.
k
@Samhita Alla, Sure I'll try it out.
k
I can import xgboost after changing
RUN ${VENV}/bin/pip install -r /root/requirements.txt
to
pip install -r /root/requirements.txt
s
That’s weird. We do have this in some of our Dockerfiles; need to verify then. Glad that you got it working.
159 Views