I have a bunch of workflows that `pyflyte --pkgs m...
# ask-the-community
j
I have a bunch of workflows that
pyflyte --pkgs mypackage package
and
flytectl register files
that work just fine, but ran into one today that always errors when run with an error
ModuleNotFoundError: No module named 'mypackage'
. After reading though others issue with package + register I tried adding the
--fast
flag, and voila, it works. My question is, why? Clearly I have some subtle difference between this and my other workflows, but what is
--fast
doing differently? The CLI help docs mention
Note this needs additional configuration, refer to the docs.
for
--fast
but I'm not sure what docs its referring to. It seems the difference is it may be including more of the source code than without fast. Should I just always use fast, even for those packages that clearly work without?
l
my team used to hit into this, this might be why if you have file like
/src/pipeline/my_package
and you import
my_package
. You would need
/src/pipeline
as you pythonpath in dockerfile (usually
/root
only) when we fast-register, we noticed it copies the package to root as
/root/my_package
, which is probably why it suddenly worked. So the way to work with both register is to set
ENV PYTHONPATH="/root:src/pipeline"
j
this could be down the right path, but our structure is
Copy code
<root>/
  mypackage/
    myworkflow.py
    __init__.py
where
pyflyte package
is done within <root> the frustrating part is this identical to others that currently work in terms of structure, so some other difference must exist and as with all these kinds of issue it does not occur with
pyflyte run
or
pyflyte register
, but I believe that is all fast by default?
k
Yes it is all fast by default
Sadly if we change the cli we break people VC @Eduardo Apolinario (eapolinario) / @Yee would like functional tests
j
I've been playing around with it to understand
fast
and I think its just an understanding issue. Below is added when run with fast to the running container:
Copy code
--additional-distribution s3://<bucket>/<project>/<domain>/7BPRMBPU3RS26QKXJIKS74XBBI======/fastc06f194bab8765ca864871b6ab6504ad.tar.gz
--dest-dir /root
Which must be taking the source code packaged by fast and adding it to the base image. The reason this workflow didn't work and our others did is we use
envd
primarily, but there are a few simple cases we don't need any extra requirements. In those cases, we just use task/workflow decorations without any image and rely on Flyte's base image. Because the development flow (pyflyte run) is fast, it just works, since the source is added to the container at runtime. The production examples don't include using fast as there must be an assumption you are pre-baking all the images with the source code (in our case envd is doing this, but could just be docker build). So
fast
really means "no pre-built container". In these simple cases pre-building a container with the source seems unecessary. I think I'll add a mechanism to conditionally package with
fast
in these scenarios in our pipeline.
Is there a reason why it couldn't be dynamic? If there are tasks without an overriding container_image it builds fast (or at least warns you are doing something wrong without fast)
y
@Justin Boutwell your understanding is spot on. i’m not sure if the production case is necessarily something we assumed though - i think there’s a fair number of users who rely on the ‘fast’ construct in prod.
no reason it can’t be, wrt the suggestion - except that more magic might be more confusing for users. certainly feasible.