Hi folks, I am running into an error when running ...
# ask-the-community
b
Hi folks, I am running into an error when running a simple hello_world workflow. I am using a custom docker image. Any idea why?
Copy code
FROM python:3.8-buster

WORKDIR /root
ENV VENV /opt/venv
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONPATH /root

ARG tag
ARG wandb_api_key
ARG wandb_username

ENV FLYTE_INTERNAL_IMAGE $tag
ENV WANDB_API_KEY $wandb_api_key
ENV WANDB_USERNAME $wandb_username

# Install the AWS cli separately to prevent issues with boto being written over
RUN pip3 install awscli

RUN apt-get update && apt-get install -y curl

ENV VENV /opt/venv
# Virtual environment
RUN python3 -m venv ${VENV}
ENV PATH="${VENV}/bin:$PATH"

# Install Python dependencies
COPY requirements.txt /root/.
RUN pip install -r /root/requirements.txt

# Copy the actual code
COPY src/ /root/src/

[1/1] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes.
[ab8dm4nttv49wgvn59kg-n0-0] terminated with exit code (1). Reason [Error]. Message: 
rtlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'site-packages.flytekit'
Traceback (most recent call last):
  File "/opt/venv/bin/pyflyte-fast-execute", line 8, in <module>
    sys.exit(fast_execute_task_cmd())
  File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/venv/lib/python3.8/site-packages/flytekit/bin/entrypoint.py", line 513, in fast_execute_task_cmd
    subprocess.run(cmd, check=True)
  File "/usr/local/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['pyflyte-execute', '--inputs', '<s3://senn-ai-mlops-flyte/metadata/propeller/flytesnacks-development-ab8dm4nttv49wgvn59kg/n0/data/inputs.pb>', '--output-prefix', '<s3://senn-ai-mlops-flyte/metadata/propeller/flytesnacks-development-ab8dm4nttv49wgvn59kg/n0/data/0>', '--raw-output-data-prefix', '<s3://senn-ai-mlops-flyte/data/sk/ab8dm4nttv49wgvn59kg-n0-0>', '--checkpoint-path', '<s3://senn-ai-mlops-flyte/data/sk/ab8dm4nttv49wgvn59kg-n0-0/_flytecheckpoints>', '--prev-checkpoint', '""', '--dynamic-addl-distro', '<s3://senn-ai-mlops-flyte/mo/flytesnacks/development/NRI2T5OSZMCFXKR4CUNLWO7MSM======/fastab502ef8d4ae75b6b5497a94633e8642.tar.gz>', '--dynamic-dest-dir', '/root', '--resolver', 'site-packages.flytekit.core.python_auto_container.default_task_resolver', '--', 'task-module', 'src.workflows.hello_world', 'task-name', 'say_hello']' returned non-zero exit status 1.
d
Hi @Bosco Raju Could you share the folder structure of your project?
b
src ├── init.py ├── pycache │ ├── init.cpython-310.pyc │ ├── data_preparation.cpython-310.pyc │ └── data_validation.cpython-310.pyc ├── data_preparation.py ├── data_validation.py ├── notebooks │ └── Convert_DS.ipynb └── workflows ├── init.py ├── pycache │ ├── init.cpython-310.pyc │ ├── hello_world.cpython-310.pyc │ ├── taxi_prediction_sagemaker.cpython-310.pyc │ └── taxi_prediction_wandb.cpython-310.pyc ├── hello_world.py ├── taxi_prediction_sagemaker.py └── taxi_prediction_wandb.py
Requirements.txt
Copy code
flytekit==1.4.2
pandas
scikit-learn
flytekitplugins-whylogs
whylogs[s3]
whylogs[mlflow]
whylogs[whylabs]
wandb
How do I change the resolver to? "flytekit.core.python_auto_container.default_task_resolver"
@David Espejo (he/him) Were you able to replicate the error on your side?
d
sorry, I wasn't
n
interesting…
site-packages.flytekit
is not a valid module. What commands are you invoking to run/package/register the workflow on the cluster?
b
within the src dir I ran pyflyte register workflows —image xxxx
I have tried creating a new project with pyflyte init hello-world. I am getting the same error. I have not had this issue before. Would appreciate if you could try it from your side?
s
How have you set up your Flyte deployment?
b
@Samhita Alla I deployed Flyte on EKS
n
can you try registering the workflow one level up from the
src
directory?
b
How would I do that? I assumed you have to run pyflyte register workflows within the folder. It complains it can’t find the workflow folder
s
You can send the full path to
pyflyte register
. Moreover, can you remove
__init__.py
in you
src
directory? I also recommend you to start afresh by creating a simple workflow, adding to a directory and registering the same. That should help you find the source of the error.
b
@Samhita Alla I am able to register and run a basic workflow using the default image. But when I build my own image based on the below Dockerfile. My workflow wouldn't run and throws an error as mentioned above. The same registration process nothing has changes except the custom image. I am building this image on my M1 Mac machine to build it for
linux/amd64
DOCKER_DEFAULT_PLATFORM=linux/amd64`. You can try the image I built docker.io/lehyperion/hello-world-flytekit:0.1.0.
Copy code
FROM python:3.8.16-slim-buster

WORKDIR /root
ENV VENV /opt/venv
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONPATH /root

RUN apt-get update && apt-get install -y build-essential

# Install the AWS cli separately to prevent issues with boto being written over
RUN pip3 install awscli

ENV VENV /opt/venv
# Virtual environment
RUN python3 -m venv ${VENV}
ENV PATH="${VENV}/bin:$PATH"

# Install Python dependencies
COPY ./requirements.txt /root
RUN pip install -r /root/requirements.txt

# Copy the actual code
COPY . /root/

# This tag is supplied by the build script and will be used to determine the version
# when registering tasks, workflows, and launch plans
ARG tag
ENV FLYTE_INTERNAL_IMAGE $tag
Requirements.txt
Copy code
flytekit==1.4.2
pandas
scikit-learn
s
@Bosco Raju, the same dockerfile works for me.
Can you run your workflow against the custom docker image on a demo cluster instead of the EKS one?
Also, you needn't copy code in your dockerfile since you're fast registering your tasks and workflows.
b
@Samhita Alla I found the issue. It has to do with sagemaker-experiments python package. I had that package installed in my local machine (where I register the workflow). Removing it has solved my problem. The error message is not intuitive for debugging. I don't have in-depth knowledge to debug this issue. Maybe this is something for the Flyte team to look into. Thanks for everyone for helping me out. This is not a trivial problem to debug.
s
Good to know you found the issue. Not sure if this is something we need to fix it on our end, but would appreciate if you could add the error you encountered along with the repro steps to https://github.com/flyteorg/flyte/discussions/3502.
162 Views