This is a very specific question and feel free to ...
# ask-the-community
d
This is a very specific question and feel free to tell me this is outside of the scope of flyte. We are trying to use pex files as part of our deployment process to minimize the size of the containers. We seem to be very close but are getting a strange error that I cannot get to the bottom of. I am trying to run the container using
pyflyte image run
. When I try to run the container without the extracted pex files, I get a module not found error. This is expected because our dependencies aren't part of the container. The important part is the execution command works as expected. When I try to run the command with our pex files, it says the workflow files aren't found. It's as if something within the python environment is broken or that it is not extracting the pb file correctly. This is the error:
Copy code
[1/1] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes.
[f728f923515794b7cb68-n0-0] terminated with exit code (1). Reason [Error]. Message: 
zen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'workflows'
Traceback (most recent call last):
  File "/opt/venv/bin/pyflyte-fast-execute", line 8, in <module>
    sys.exit(fast_execute_task_cmd())
  File "/opt/venv/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/venv/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/opt/venv/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/venv/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/venv/lib/python3.9/site-packages/flytekit/bin/entrypoint.py", line 513, in fast_execute_task_cmd
    subprocess.run(cmd, check=True)
  File "/usr/local/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['pyflyte-execute', '--inputs', '<s3://my-s3-bucket/metadata/propeller/flytesnacks-development-f728f923515794b7cb68/n0/data/inputs.pb>', '--output-prefix', '<s3://my-s3-bucket/metadata/propeller/flytesnacks-development-f728f923515794b7cb68/n0/data/0>', '--raw-output-data-prefix', '<s3://my-s3-bucket/data/ox/f728f923515794b7cb68-n0-0>', '--checkpoint-path', '<s3://my-s3-bucket/data/ox/f728f923515794b7cb68-n0-0/_flytecheckpoints>', '--prev-checkpoint', '""', '--dynamic-addl-distro', '<s3://my-s3-bucket/flytesnacks/development/KRKCIUHF3ZIBQ6OBQ6FCX2PXRQ======/scriptmode.tar.gz>', '--dynamic-dest-dir', '/root', '--resolver', 'flytekit.core.python_auto_container.default_task_resolver', '--', 'task-module', 'workflows.cmi_mvp', 'task-name', 'base_mission_sim_etl']' returned non-zero exit status 1.
Our docker buildfile looks like the following:
Copy code
FROM python:3.9-slim-buster as dependencies
WORKDIR /root
ENV VENV /opt/venv
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONPATH /root
COPY src.cmi_orchestration/binary-deps.pex /binary-deps.pex
RUN PEX_TOOLS=1 /usr/local/bin/python /binary-deps.pex venv --scope=deps --compile /opt/venv

FROM python:3.9-slim-buster as sources
WORKDIR /root
ENV VENV /opt/venv
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONPATH /root
COPY src.cmi_orchestration/binary-srcs.pex /binary-srcs.pex
RUN PEX_TOOLS=1 /usr/local/bin/python /binary-srcs.pex venv --scope=srcs --compile /opt/venv

FROM python:3.9-slim-buster as local-dev
WORKDIR /root

COPY --from=dependencies /opt/venv /opt/venv
COPY --from=sources /opt/venv /opt/venv

ENV VENV /opt/venv
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONPATH /root
RUN apt-get update && apt-get install -y build-essential
RUN pip3 install awscli
ENV VENV /opt/venv

RUN python3 -m venv ${VENV}
ENV PATH="${VENV}/bin:$PATH"
ENV ENV_FOR_DYNACONF cluster
ARG tag
ENV FLYTE_INTERNAL_IMAGE $tag
Does anyone have any tips on what I can do to debug what is going on? My assumption is the pb file is fine. If I run the same
pyflyte
run command against a version of the container that doesn't contain the copy command, flyte seems to find the workflow files. I thought something might be getting corrupted with flyte so we tried installing it again later in the process with the awscli. That didn't have any impact. A pip freeze within the container shows the same set of requirements in both of the containers. If I install the libraries normally it does work as expected. We were hoping to use the pex files as it significantly reduces the size of our containers and makes our deployment process easier.
r
I'd kicked the tires on something similar (we're standardizing on PEX images vs having python-specific container image builds w/ bazel) -- how are you producing the pex file?
Is it just w/ the pex cli or are you using pants to build it
d
we are using pants to build it
the target looks like:
r
Hmm..is it possible you aren't including all of the workflow files as deps in the
pex_binary
target. Our equivalent is
not_imported_deps
for our
pex_library
bazel target
d
shouldn't the workflow files be thrown in as part of the pb file when I run
pyflyte run
?
r
I don't think so - I think you need to explicitly include the full set of workflow py deps as
dependencies
I'm not sure how bundling the pb files will work in pants btw - for us, we directly run registration inside our own bazel macro that renders the pb files in the runfiles tree
d
so you are talking about the python library dependencies for the workflows. As in the libraries specified in the pyproject.toml
r
Or just other sources within your repo that your workflow/task py modules depend on
My general thought for getting pex working w/ flyte is: • The pex_binary's entrypoint is used as the container entrypoint, and you have a "trampoline" command which can understand
pyflyte-{map, fast}-execute
and delegate to the appropriate pyflyte cli subcommand • The deps explicitly include the whole transitive closure of all the workflow/task/launch plan modules
d
so should I be changing the entrypoint on the container?
r
I think so - otherwise how is the pex being invoked?
It smells like a bit of an antipattern to try to run the vanilla
pyflyte
entrypoint w/ an unzipped pex
d
that's my impression. This feels super sketchy.
overwriting /opt/venv feels really dangerous
r
I think you should try to reason about how to bake all of the deps into the pex, plus the entrypoint that can understand all of the pyflyte run commands
d
what are you using as your entrypoint currently?
r
We do something like this in an
entrypoint.py
:
Copy code
if sys.argv[1] in ["pyflyte-execute", "pyflyte-map-execute", "pyflyte-fast-execute"]:
    sys.exit(pyflyte_execute._pass_through())
elif sys.argv[1] == "pyflyte":
    sys.argv = sys.argv[1:]
    sys.exit(pyflyte.main())
else:
    raise RuntimeError()
and that's the entrypoint to our
py_image
in bazel land. But equivalently you could use that for a pex_binary's main
we then have that particular single script symlinked to
/usr/bin/pyflyte
,
/usr/bin/pyflyte-run
, etc, but in your case it'd just be to where you mounted the pex
d
do you have a link to where you are doing that in the repo?
r
ah, that's a modified snippet from our codebase
FWIW with pants you don't need to use pex for your entrypoint. I think you can just copy sources in and treat your dockerfile normally
It's certainly nicer than pip installing your world into the container & pex is great for a bunch of other reasons, but that may unblock you faster
d
copy sources where?
r
Oh in a
docker_image
target in pants
i think if you specify the appropriate file group deps you can just proceed as usual w/ a dockerfile
d
you mean like with this type of target?
r
yep
I've been wanting to do a minimal pants/flyte POC for a while, I'll be sure to link you to it if I throw one up
d
that would be amazing.
I wouldn't be putting this much effort into it except it really speeds up our build time and reduces the image size by ~80%
r
(We considered switching from Bazel -> Pants about 6 months ago, but found out it was too much of a lift to unwind bazel from our codebase. That's what got me into it in the first place)
Yep, we love pex images for the same reason
Though, with bazel
py_image
solves this in a different way so the overhead isn't as severe for us. Since we couldn't switch to pants, we rolled our own pex targets for bazel that mimic what pants does under the hood
d
pex is obviously still a bit new to me and our team.
I'm not sure I'm seeing the right path forward with the docker container target. My hunch is the pex file is going to be the easiest. If I understand what you are saying correctly, I will need to do the following: 1. Create a pex file that has the correct flyte commands in it. 2. Add a new entrypoint into the pex file that follows the format you outlined above. 3. Test this using
our_pex_file.pex pyflyte-execute
This should error our but be found. Is that about right?
k
@Dan Corbiani I am getting a similar error while running a spark task. Have you resolved this issu subprocess.CalledProcessError??
100 Views