We have a Flyte setup with a number of different w...
# flyte-support
s
We have a Flyte setup with a number of different workflows, some dependent on each other. For now, we are building a separate image for each workflow and registering them one by one. When we want to run the whole pipeline we start the workflows one by one from the UI and monitor the progress. When successful, we can launch the next one. Now we want to start programmatically orchestrating the workflows. Our initial idea is to use flyte remote to start the latest production version of each wf and upon success, start the next one. This would be built into a higher level "orchestration" flyte workflow. Issue with this approach that we must use a lot of boilerplate to launch each subworkflow. Additionally, the level of monitoring on the orchestration workflow level is not great. To see what tasks are running etc one must find the subworkflow execution and monitor from there. We are wondering if there is a more "native" way of doing this. Some requirements for the setup • We must be able to use different images per workflow • Must be frictionless to develop the subworkflows as separate entities. Then upon release, use the latest version in the orchestration flow. Main issue we are running into: how do we serialize and register all of the subworkflows at the same time (so we can import them in the orchestration wf) and use different images for each (bc we have different requirements for each). Any ideas on the topic are welcome, thanks in advance!
s
Hi there! I'm on the product side at Union. In Flyte, I believe you can serialize/register everything in your source root directory by using the
--copy
flag in the Pyflyte CLI (docs). However, I don't think you actually need to register everything at the same time you run it - you should be able to create a new execution of an existing workflow (and the same for each subworkflow you want to execute) using
FlyteRemote
(docs). Note that if you want to run FlyteRemote inside a task (like in the case of your higiher-level Flyte workflow), you need to make sure the task is authenticated with Flyte. In terms of observability, agreed this is somewhat limited since Flyte doesn't have a notion that the workflows are related to each other. You can definitely use
FlyteRemote
to fetch the execution status of each of your subworkflows, but there is no meta-DAG in the UI. For your image management, I would look into ImageSpec, which lets you define your requirements in code and specify images at the task level. So you have lots of flexibility there. If you'd be open to looking at Union, we have built a more native way for workflows to trigger each other called Reactive Workflows (launch blog and docs).
s
Hi John, thanks a lot for your thoughts! Will check
--copy
and if that could be useful 🙂 As mentioned,
FlyteRemote
is our initial thought as well. It's just a shame that we will lose out on observability and have to write a lot of custom logic to orchestrate the workflows. To me, it feels like this kind of thing would be exactly what should be enabled by an orchestration tool. Building and running workflows separately and being able to chain them together in a neat way. I'll have a look at what you have built, looks cool!
s
> To me, it feels like this kind of thing would be exactly what should be enabled by an orchestration tool. Building and running workflows separately and being able to chain them together in a neat way. Taking a step back, do you actually need to separate out the subworkflows? It is possible to run workflows of workflows in Flyte: https://docs.flyte.org/en/latest/user_guide/advanced_composition/subworkflows.html Basically if you are okay keeping everything in a single execution id (which it seems like you want to do), you should be able to nest workflows as necessary. Also check out dynamic: https://docs.flyte.org/en/latest/user_guide/advanced_composition/dynamic_workflows.html#dynamic-workflow And, sorry to overload with information, but if you need to register those workflows separately, you can check out reference workflows - this is a way to use an entity that is already registered: https://docs.flyte.org/en/latest/api/flytekit/generated/flytekit.reference_workflow.html
s
Did not know about reference workflows!! Appreciate that - might just work for us. And good point, in theory we could just slam everything to one workflow. The issues with that would (in my mind) • Multiple devs working with different parts of the pipeline > versioning is painful when workflows get registered for everything when you actually change only one part. • We have very different requirements for different parts of the flow (e.g. gpu/cpu), so we would still need separate images linked to different parts of the orchestrated workflow. Seems a bit extra to have to specify the image to each task in the flow. Running "workflows of workflows" is exactly what we need. But don't want all of the workflows to use the same image. And want to keep it so that we can easily also build and run the workflows separately.
s
Flyte does let you set resources at the workflow level (as well as domain, project, and per-task): https://docs.flyte.org/en/latest/deployment/configuration/customizable_resources.html I would need to see if this is also possible to do with images.
s
We do our infra deployment through argocd and I'm not sure if we can mix the resource configs with the application code 🤔 Do you know if this is possible: Project structure like:
Copy code
/workflows/
../orchestration/
.....orchestrator_wf.py
.....Makefile
.....Dockerfile
../subwf1/
.....subwf1_wf.py
.....Makefile
.....Dockerfile
../subwf2/
.....subwf2_wf.py
.....Makefile
.....Dockerfile
In orchestrator_wf.py we would import wfs from
subwf1_wf.py
and
subwf2_wf.py
and thus we will have full DAGs in UI. In orchestration/Makefile we would 1. Build three images based on Dockerfiles in
orchestration
,
subwf1
, and
subwf2
2. Package and register them in a way where each workflow would use their own image when launched This is how I would ideally build this but now sure if we can register in that manner. Also, this would then enable development of the individual subwfs without worrying about the main wf.