Hi all building my first more complex flyte workflow and I h Flyte #flyte-support

Hi all, building my first more complex flyte workf...

salmon-stone-89503

02/08/2023, 5:46 PM

Hi all, building my first more complex flyte workflow and I have a couple of questions on best practices. I have a workflow defined in

project_1

that includes a dynamic workflow. The dynamic workflow starts a launchplan defined in

project_2

(see attached sketch). This is my code structure:

Copy code

|- project_1 /
|  |- pyproject.toml
|  |- project_1 /
|  |  |- __init__.py
|  |  |- workflows.py
|  |  |- tasks.py
|  |  |- models.py
|  |  |- ...
|- project_2 /
|  |- pyproject.toml
|  |- project_2 /
|  |  |- __init__.py
|  |  |- workflows.py
|  |  |- tasks.py
|  |  |- models.py
|  |  |- ...
|- Dockerfile
|- docker_build.sh

What I'm doing right now: • add

project_2

as a dependency to project one • build a docker image with

project_2

installed as a dependency to project 1 • package + register • run This approach requires rebuilding the docker image with every execution because fast register will not realise that the launchplan defined in

project_2

should have a different version than the rest of the workflow and it fails while trying to fetch the workflow. Another option would be to do something like:

Copy code

remote = FlyteRemote(config=Config.auto())
launchplan = remote.fetch_launch_plan(...)

but that would require my pod to be able to reach flyteadmin because it's inside a dynamic workflow, Is there anything I can do to avoid rebuilding the docker image for every run? Am I doing something wrong? Any advice is much appreciated. Another issue is that the way I run things now makes the UI fail to expand the dynamic workflow and doesn't provide any link to the newly spawned workflows so I can't reliably track the intermediate states or see why something failed (if it did). (see screenshot, the purple task is dynamic). It also leads to all sorts of mixed status reports (see other screenshot).

hallowed-mouse-14616

02/09/2023, 12:39 PM

cc @high-accountant-32689

melodic-magician-71351

02/09/2023, 4:20 PM

@salmon-stone-89503 and I are on the same team, to speak to the behavior we expected: We expect to need to rebuild when there are changes to workflows/launch plans outside the package which is being fast registered. That makes sense. What we didn't expect is that

pyflyte register

will register a new version of the imported launch plan, which will then try to reference a workflow version that doesn't exist (because it was defined in the external package and not imported). That said I'm not sure how you would specify a version for the launch plan when "passing by value" like this, maybe it is not desirable functionality, and if so we'd like to understand what the right pattern is here (i.e. how do you correctly "pass by reference" for a launch plan or workflow). I think pass by value is a strong preference for us, because it preserves local functionality. Maybe if we keep the launch flow version in sync with the package

__version__

attribute that could serve as a hint during registration?

freezing-airport-6809

02/10/2023, 2:49 PM

Sorry we missed the message, will tal in a bit

broad-monitor-993

02/10/2023, 5:06 PM

hi @salmon-stone-89503 have you tried registering both projects at the same time?

Copy code

pyflyte register project_1 project_2 --image ...

This should fast-register both projects and keep the launchplan code from

project_2

consistent

broad-monitor-993

02/10/2023, 7:49 PM

you can also do with with

pyflyte package

Copy code

pyflyte --pkgs project_1 --pkgs project_2 package ...

salmon-stone-89503

02/11/2023, 2:00 AM

I remember trying that and getting an error but I'll give it another go. Thanks 🙏

salmon-stone-89503

02/14/2023, 1:39 PM

pyflyte package

seems to work but just if it's ran from the top level directory (the parent to

project_1

and

project_2

), thanks! This is what I see with `flytectl register`:

Copy code

{"json":{"exec_id":"a6znzvsvnkmmnsdl6m5d","node":"n2","ns":"development","res_ver":"336142287","routine":"worker-35","wf":"<project name>:<domain name>:project_1.workflows.<workflow_name>"},"level":"error","msg":"handling producing dynamic workflow definition failed with error: [system] unable to retrieve launchplan information ::<launchplan_name>:<version>}

so basically the workflow picks up the project and domain properly but it doesn't get forwarded to the imported launchplan

salmon-stone-89503

02/14/2023, 1:40 PM

Any thoughts about the status missmatch? The CLI is equally confusing.

broad-monitor-993

02/14/2023, 3:09 PM

can you copy-paste the command your running to package?

salmon-stone-89503

02/14/2023, 3:20 PM

pyflyte --pkgs project_1 --pkgs project_2 package --image <image>

flytectl register --project <project> --domain <domain> --version <version> --archive flyte-package.tgz

broad-monitor-993

02/14/2023, 3:34 PM

cool, I suspect that the nested directories might be an issue, but I’m trying to repro locally now

🙏 1

broad-monitor-993

02/14/2023, 3:50 PM

I’m still doing some investigation on support this use case, but another option I’d like to throw out here is using `reference_launch_plan`: https://docs.flyte.org/projects/flytekit/en/latest/generated/flytekit.reference_launch_plan.html The downside of this is that it isn’t supported locally (in a python runtime), but should work on a

flytectl demo

cluster if that’s any consolation 😅

broad-monitor-993

02/14/2023, 4:24 PM

hey @salmon-stone-89503 so I managed to get something working on a flyte demo cluster.

Copy code

.
├── Dockerfile
├── LICENSE
├── README.md
├── docker_build.sh
├── flyte-package.tgz
├── requirements.txt
├── workflows
│   ├── __init__.py  # 👈 module
│   ├── __pycache__
│   └── workflows
│       ├── __init__.py
│       ├── __pycache__
│       └── example.py
└── workflows2
    ├── __init__.py  # 👈 module
    ├── __pycache__
    └── workflows2
        ├── __init__.py
        ├── __pycache__
        └── example.py

To summarize, basically I needed to make the top-level

workflows

and

workflows2

a module as well.

workflows2.example

defines a

launch_plan

, which I then import in

workflows.example

with

Copy code

from workflows2.workflows2.example import launch_plan

...

@dynamic
def wf_lp_test() -> typing.List[str]:
    out = []
    for i in range(3):
        out.append(launch_plan()[0])
        return out

This produces a dynamic workflow that spawns three launch plans. I’m not sure if this solution works well for your requirements and current setup, but basically every time you fast register the latest version of

workflows2

should be used.

broad-monitor-993

02/14/2023, 4:26 PM

Another issue is that the way I run things now makes the UI fail to expand the dynamic workflow and doesn’t provide any link to the newly spawned workflows so I can’t reliably track the intermediate states or see why something failed (if it did)

This is currently a limitation in the UI, we need to make a ticket to track this. Would you mind opening up an issue here [flyte-ui] 👇

user

02/14/2023, 4:26 PM

💻 Create a Flyte UI Feature Request issue: https://github.com/flyteorg/flyte/issues/new?assignees=&labels=enhancement%2C+untriaged%2C+ui&template=ui_feature_request.yaml&title=%5BUI+Feature%5D+

broad-monitor-993

02/14/2023, 4:53 PM

So to summarize, when working with multi-project Flyte repos there are effectively 3 options when using launchplans across projects: 1. fast registering all projects: this requires all projects to be modules defined at the top-level of the repo (basically this solution I posted earlier). 2. use reference_launch_plan: this lets you use launchplans across projects, referenced by project, domain, name, and version. The limitation of this is that you can test locally on a python runtime, but should be testable in a

flytectl demo

cluster. 3. Use FlyteRemote: as mentioned in the top post of this thread, launchplans can be fetched from FlyteAdmin and executed using FlyteRemote. The limitation here is that the dynamic workflow pod needs access to the cluster. 💡 For (1), perhaps fast registration in

pyflyte package

and

pyflyte register

can be make more flexible such that you can reference subdirectories that are packaged up in the fast-registered zip-file like

pyflyte --pkgs <directory>:<module>

so that, e.g. the

project_1

module can be inside a non-python module directory but will be made available as a top-level directory in the container

/root

. Perhaps another idea is to inject a

PYTHONPATH

env variable so that Flyte can find all the subprojects in a repo containing multiple Flyte projects. Any other thoughts would be appreciated here @thankful-minister-83577 @high-accountant-32689

177 Views

Open in Slack

Previous Next