This is a very specific question and feel free to tell me this is outside of the scope of flyte. We are trying to use pex files as part of our deployment process to minimize the size of the containers. We seem to be very close but are getting a strange error that I cannot get to the bottom of.
I am trying to run the container using
pyflyte image run
. When I try to run the container without the extracted pex files, I get a module not found error. This is expected because our dependencies aren't part of the container. The important part is the execution command works as expected.
When I try to run the command with our pex files, it says the workflow files aren't found. It's as if something within the python environment is broken or that it is not extracting the pb file correctly. This is the error:
[1/1] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes.
[f728f923515794b7cb68-n0-0] terminated with exit code (1). Reason [Error]. Message:
zen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'workflows'
Traceback (most recent call last):
File "/opt/venv/bin/pyflyte-fast-execute", line 8, in <module>
sys.exit(fast_execute_task_cmd())
File "/opt/venv/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/opt/venv/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/opt/venv/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/venv/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/venv/lib/python3.9/site-packages/flytekit/bin/entrypoint.py", line 513, in fast_execute_task_cmd
subprocess.run(cmd, check=True)
File "/usr/local/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['pyflyte-execute', '--inputs', '<s3://my-s3-bucket/metadata/propeller/flytesnacks-development-f728f923515794b7cb68/n0/data/inputs.pb>', '--output-prefix', '<s3://my-s3-bucket/metadata/propeller/flytesnacks-development-f728f923515794b7cb68/n0/data/0>', '--raw-output-data-prefix', '<s3://my-s3-bucket/data/ox/f728f923515794b7cb68-n0-0>', '--checkpoint-path', '<s3://my-s3-bucket/data/ox/f728f923515794b7cb68-n0-0/_flytecheckpoints>', '--prev-checkpoint', '""', '--dynamic-addl-distro', '<s3://my-s3-bucket/flytesnacks/development/KRKCIUHF3ZIBQ6OBQ6FCX2PXRQ======/scriptmode.tar.gz>', '--dynamic-dest-dir', '/root', '--resolver', 'flytekit.core.python_auto_container.default_task_resolver', '--', 'task-module', 'workflows.cmi_mvp', 'task-name', 'base_mission_sim_etl']' returned non-zero exit status 1.
Our docker buildfile looks like the following:
FROM python:3.9-slim-buster as dependencies
WORKDIR /root
ENV VENV /opt/venv
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONPATH /root
COPY src.cmi_orchestration/binary-deps.pex /binary-deps.pex
RUN PEX_TOOLS=1 /usr/local/bin/python /binary-deps.pex venv --scope=deps --compile /opt/venv
FROM python:3.9-slim-buster as sources
WORKDIR /root
ENV VENV /opt/venv
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONPATH /root
COPY src.cmi_orchestration/binary-srcs.pex /binary-srcs.pex
RUN PEX_TOOLS=1 /usr/local/bin/python /binary-srcs.pex venv --scope=srcs --compile /opt/venv
FROM python:3.9-slim-buster as local-dev
WORKDIR /root
COPY --from=dependencies /opt/venv /opt/venv
COPY --from=sources /opt/venv /opt/venv
ENV VENV /opt/venv
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONPATH /root
RUN apt-get update && apt-get install -y build-essential
RUN pip3 install awscli
ENV VENV /opt/venv
RUN python3 -m venv ${VENV}
ENV PATH="${VENV}/bin:$PATH"
ENV ENV_FOR_DYNACONF cluster
ARG tag
ENV FLYTE_INTERNAL_IMAGE $tag
Does anyone have any tips on what I can do to debug what is going on?
My assumption is the pb file is fine. If I run the same
pyflyte
run command against a version of the container that doesn't contain the copy command, flyte seems to find the workflow files. I thought something might be getting corrupted with flyte so we tried installing it again later in the process with the awscli. That didn't have any impact. A pip freeze within the container shows the same set of requirements in both of the containers.
If I install the libraries normally it does work as expected. We were hoping to use the pex files as it significantly reduces the size of our containers and makes our deployment process easier.