Hello, we’re developing a pipeline register that u...
# flyte-support
a
Hello, we’re developing a pipeline register that uses
FlyteRemote.register_workflow()
to register and run workflow. We discovered that if an
___init__.py_
file is present in the directory where the register code resided, like
Copy code
# flyte_playground directory
from flytekit.remote import FlyteRemote
from flytekit.configuration import Config, ImageConfig
from autojoin_flyte_workflow.workflow import wf
from flytekit.configuration import SerializationSettings, FastSerializationSettings


remote = FlyteRemote(config=Config.auto(), default_project="flytesnacks", default_domain="development")

_, native_url = remote.fast_package(root=".")

flyte_workflow = remote.register_workflow(
    entity=wf,
    serialization_settings=SerializationSettings(
        image_config=ImageConfig.auto_default_image(),
        project="flytesnacks",
        domain="development",
        version="remote_registration_v0",
        fast_serialization_settings=FastSerializationSettings(enabled=True, destination_dir=".", distribution_location=native_url),
    ),
)

remote.execute(
    entity=flyte_workflow,
    # inputs={"dirpath": "<s3://my-s3-bucket/input-data-test>", "output_location": "<s3://my-s3-bucket/output-data-test/>"},
    inputs={"dirpath": "<gs://igenius-flyte-userdata/alessandro-test/input-data>", "output_location": "<gs://igenius-flyte-userdata/alessandro-test/output-data/>"}
)
, the Task won’t be able to execute because of a
ModuleNotFoundError: No module named 'flyte_playground'
error. As soon as we remove the
__init__.py
it starts working but we’ll lack the possibility to import the code above from in our project. Do you have any advice how to manage a situation like this? Thanks
t
init is used to determine the path of the task/workflow, so that might be causing the error you're seeing. is it not working when you have init in the flyte_playground directory? would you mind sharing your directory structure?
b
Hi @tall-lock-23197! πŸ‘‹ Indeed, the error arises with the init in
flyte_playground
... with this tree
Copy code
flyte_playground
β”œβ”€β”€ hello_world_dir
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── hello_world.py
└── register_workflow.py
registration and run both work, but with this
Copy code
flyte_playground
β”œβ”€β”€ __init__.py
β”œβ”€β”€ hello_world_dir
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── hello_world.py
└── register_workflow.py
the workflow is registered but then at run time we get
ModuleNotFoundError: No module named 'flyte_playground'
. However, without the init we can't really import the workflows in hello_world_dir from other parts of the project (where flyte_playground resides). This is how we're registering the workflow (the content of
register_workflow.py
, we just run
python register_workflow.py
from within flyte_playground):
Copy code
from flytekit.remote import FlyteRemote
from flytekit.configuration import Config, ImageConfig
from hello_world_dir.hello_world import hello_world_workflow
from flytekit.configuration import SerializationSettings, FastSerializationSettings


remote = FlyteRemote(config=Config.auto(), default_project="flytesnacks", default_domain="development")

_, native_url = remote.fast_package(root=".")

flyte_workflow = remote.register_workflow(
    entity=hello_world_workflow,
    serialization_settings=SerializationSettings(
        image_config=ImageConfig.auto_default_image(),
        project="flytesnacks",
        domain="development",
        version="1.0.1",
        fast_serialization_settings=FastSerializationSettings(enabled=True, destination_dir=".", distribution_location=native_url),
    ),
)

remote.execute(
    entity=flyte_workflow,
    inputs={"name": "foo", "answer": "bar"},
)
We were taking inspiration from this old thread of yours. Is there a way to handle this with
root
and
destination_dir
at fast registration?
t
can you try modifying
destination_dir
to say,
/root
? cc @glamorous-carpet-83516
b
Sure,
destination_dir="/root"
yields
ModuleNotFoundError: No module named 'flyte_playground'
upon execution (workflow gets registered)
t
what's the name of the task, i mean the whole path that's being shown on the UI?
b
With the init file in
flyte_playground
and
destination_dir=="/root"
, the workflow gets registered as
flyte_playground.hello_world_dir.hello_world.hello_world_workflow
and doesn't run with
ModuleNotFoundError: No module named 'flyte_playground'
. Instead without the init in
flyte_playground
(still
destination_dir=="/root"
) the workflow is registered as
hello_world_dir.hello_world.hello_world_workflow
and runs successfully
t
i think you need to move the remote execution script one level above
flyte_playground
.
b
Moving it one level above still yields
ModuleNotFoundError: No module named 'flyte_playground'
. I also changed the root in remote.fast_package to
"flyte_playground"
to avoid registering everything that's one level above (a lot of stuff), and kept
destination_dir="/root"
. Anything else I should have changed?
t
does `flyte_playground`'s parent directory have an
__init__.py
file?
b
It didn't, just added it - but it only changes the registration name and error. Basically with the init in the parent the workflow gets registered as
parent.flyte_playground.hello_world_dir.hello_world.hello_world_workflow
and the error at execution is
ModuleNotFoundError: No module named 'parent'
t
no don't add init to the parent directory, it isn't necessary. would you mind untarring the fast registration package so that we know what the parent directory is? you can find it in the task details section on the UI.
b
I guess the parent is flyte_playground (without the additional init one level above), by untarring I got back its content
Copy code
x hello_world_dir/
x hello_world_dir/__init__.py
x hello_world_dir/hello_world.py
x __init__.py
with respect to this message, the remote execution file is missing since moved one level above. (FYI: couldn't find the tar file in the UI, had to rely on the message that I get on terminal when running the remote registration script:
No output path provided, using a temporary directory at /var/folders/... instead
)
t
untarring should give
flyte_playground
directory, not the contents of it. makes sense?
i guess it has to do with
remote.fast_package(root="flyte-playground")
. when you move the remove script one level above and modify the root, it'll package the contents in the flyte-playground directory, when it indeed must have the flyte-playground directory.
b
How should I set the root then?
t
it has to be the
parent
directory.
b
But like so won't it package all the content of the parent directory? Meaning that if there's other stuff within
parent
, that will get packaged too, am I wrong?
t
yes, it'll get packaged. you can add files or packages that you don't want to serialize to .flyteignore
b
Ok, so the gist is that in the root dir where we run the registration file there has not to be any init file, right? Or is there a way around this? (Note: I didn't find any documentation about
.flyteignore
, is there any?) In the meantime, thank you, Samhita! πŸ™
t
so the gist is that in the root dir where we run the registration file there has not to be any init file, right?
that's right. that's how the task path is determined.
I didn't find any documentation about
.flyteignore
, is there any?
we'll need to document this but it's available: https://github.com/flyteorg/flytekit/blob/5b3e7256c8cb14f19dd727e964b26be346d491b5/flytekit/tools/ignore.py#L90
thx 2
b
> that's right. that's how the task path is determined. So
root
in remote.fast_package only changes the fast registration package path but not the task path? I would have expected them to be related, so that changing root at registration also changed the task path. Might that be a bit of a limitation for some use cases?
t
task path is dependent on
__init__.py
files. it'll begin from the directory whose parent directory doesn't have an init file.