Question: Is it possible to pass modules along wit...
# ask-the-community
t
Question: Is it possible to pass modules along with the workflow file when executing the workflow? As an example, I modified
basic_workflow.py
to import a module function instead of have it defined directly. Here is the workflow code:
basic_workflow.py
(modified):
Copy code
import typing
from typing import Tuple

from example import example_fn
from flytekit import task, workflow


@task
def t1(a: int) -> typing.NamedTuple("OutputsBC", t1_int_output=int, c=str):
    # return a + 2, "world"
    return example_fn(a)


@task
def t2(a: str, b: str) -> str:
    return b + a


@workflow
def module_wf(a: int, b: str) -> Tuple[int, str]:
    x, y = t1(a=a)
    d = t2(a=y, b=b)
    return x, d
(notice
from example import example_fn
)
example.py
(the imported module):
Copy code
from typing import Tuple

def example_fn(a: int) -> Tuple[int, str]:
    return a + 2, "world"
When I execute:
Copy code
pyflyte run --remote basic_workflow.py module_wf --a 5 --b hello
I get the error:
Copy code
ModuleNotFoundError: No module named 'example'
n
hey @Tom Szumowski what does the directory structure look like?
t
@Niels Bantilan
Copy code
~/workspace/flyte/experiments/module_test
$ l
total 16
-rw-r--r--   1 szumowskit1  288552604    0 Aug  1 13:18 __init__.py
drwxr-xr-x  15 szumowskit1  288552604  480 Aug  1 13:19 ..
-rw-r--r--   1 szumowskit1  288552604   94 Aug  1 13:25 example.py
-rw-r--r--   1 szumowskit1  288552604  418 Aug  1 13:25 basic_workflow.py
drwxr-xr-x   5 szumowskit1  288552604  160 Aug  1 14:43 .
and
pyflyte
was executed from that directory
y
hi
yeah so this won’t work - we have work to do on our end to error out earlier. the point of the pyflyte
run
command, was that it was a light-weight thing that would help people quickly iterate on small self-contained scripts
when you
run
something, that script gets zipped up and put somewhere, and when the task runs on the backend, it gets pulled by down
but it only works on one file (maybe can revisit in the future)
can you instead modify your command and use
pyflyte register
instead?
this will not launch an execution for you however
so there’s no
my_wf --input_a foo
component to the command, just leave that off
t
@Yee thank you for the clarification. I take it you were referring to
flytectl register
?
If so, do I go through the packaging steps as shown here? https://docs.flyte.org/projects/cookbook/en/latest/auto/larger_apps/larger_apps_deploy.html Otherwise I get:
Copy code
$ flytectl register files --project=flytesnacks --domain=development basic_workflow.py 
Error: input package have some invalid files. try to run pyflyte package again [basic_workflow.py]
_(Side Note: I see the mention of all this in the docs now, here._ 🤦🏻 )
you’ve been running Flyte workflows as one-off scripts, which is useful for quick prototyping and iteration of small ideas on a Flyte cluster
if you need to build a larger Flyte app with sub-modules or sub-packages to organize the logic of your tasks and workflows
...
y
pyflyte register
(the docs for which are unfortunately still coming 😞)
t
I get a No Command error:
Copy code
$ pyflyte register basic_workflow.py
Usage: pyflyte [OPTIONS] COMMAND [ARGS]...
Try 'pyflyte --help' for help.

Error: No such command 'register'.
Just tossing in some additional experimentation. I was able to do (what I believe to be) a full registration via these commands: Serialize:
Copy code
$ pyflyte --pkgs module_test package --image <http://gcr.io/urbn-data-science/flytekit-test-wrapper:latest|gcr.io/urbn-data-science/flytekit-test-wrapper:latest>
Register:
Copy code
$ flytectl register files --project flytesnacks --domain development --archive flyte-package.tgz  --version v1
where
module_test
is the directory listed above. Then after kicking off a run of that workflow in the console, I get the same error:
Copy code
ModuleNotFoundError: No module named 'example'
Granted, I can package up the code in the Docker build. But was curious if there was a way to serialize module imports instead of having to bake it in the Docker image.
n
can you copy-paste the Docker image you’re using?
t
@Niels Bantilan
Copy code
FROM <http://gcr.io/deeplearning-platform-release/pytorch-gpu:latest|gcr.io/deeplearning-platform-release/pytorch-gpu:latest>
# FROM python:3.8
# FROM <http://gcr.io/urbn-data-science/flytekit-test-wrapper:latest|gcr.io/urbn-data-science/flytekit-test-wrapper:latest>

WORKDIR /root
ENV PYTHONPATH /root

RUN pip install gsutil

# Pod tasks should be exposed in the default image
RUN pip install -U flytekit flytekitplugins-pod

# Required for gsutil to work with workload-identity
RUN echo '[GoogleCompute]\nservice_account = default' > /etc/boto.cfg

ENV FLYTE_INTERNAL_IMAGE "FOOBAR"
It's a copy/paste of the flytekit Dockerfile.python3.8 file, but layered on a GCP deep learning GPU image. This I am able to use running single-script
pyflyte
calls, such as:
Copy code
pyflyte run --image <http://gcr.io/urbn-data-science/flytekit-test-wrapper:latest|gcr.io/urbn-data-science/flytekit-test-wrapper:latest>  --remote basic_workflow.py module_wf --a 5 --b hello
n
interesting… is there a bit in there that `COPY`s the source code into the Docker image?
t
No. That's kind of what I'm trying to better understand. i.e. Does the code need to always be baked into Docker build, or can it be mapped on at runtime? I understand for single-script
pyflyte run
runs, it does not need to be. But for ones that import modules, I suspect it needs to be baked into the Docker image build? Following the docs, I noticed the Dockerfile auto-generated here copies the code in. But wasn't sure if that was always needed or not.
n
@Yee would probably know better, but for the OG flow (serialize with
pyflyte package
then register with
flytectl register
) you need to bake in the source code into the Docker container. For
pyflyte run
I don’t think this is the case … it uses “fast serialization” to load your source code from blob store when the task is executed. Not sure about `pyflyte register`… I’m guessing source code needs to be included in the Docker container.
For
pyflyte run
I don’t think this is the case … it uses “fast serialization” to load your source code from blob store when the task is executed.
I suspect this is why you originally got the
No module named 'example'
error…
pyflyte run
doesn’t support user-defined imports
t
@Niels Bantilan Thank you. Makes sense. Just me playing around with the workflows a bit. Ketan mentioned here that typical CI flows bake the code in, and that would work for us too. Just working through the different ways to run workflows while experimenting 🙂
👍 1
n
cool! sounds good. yeah right now
pyflyte run
is for fast prototyping and demo use cases… for anything more serious/complex
pyflyte register
or
pyflyte package + flytectl register
is probably the way to go.
```$ pyflyte register basic_workflow.py
Usage: pyflyte [OPTIONS] COMMAND [ARGS]...
Try 'pyflyte --help' for help.
Error: No such command 'register'.```
I think you need
flytekit >= 1.1.0
for that command
t
@Niels Bantilan ah! I somehow was on 1.0.3 even though I installed just last week. Must've been following some older install notes. In any case, @Yee, after upgrading to 1.1.0,
pyflyte register
worked without needing to bake the code into the Docker image. 🎉 Thank you both!
💯 1
164 Views