Hello In Airflow there is a mechanism for a dag to wait unti Flyte #flyte-support

Hello, In Airflow there is a mechanism for a dag t...

salmon-refrigerator-32115

02/10/2023, 5:26 PM

Hello, In Airflow there is a mechanism for a dag to wait until another dag has successfully completed (ExternalTaskSensor), then it will start. What is the mechanism in Flyte that achieves this?

freezing-airport-6809

02/10/2023, 5:34 PM

You can write your own sensor

freezing-airport-6809

02/10/2023, 5:35 PM

Or use - external trigger from lambda

broad-monitor-993

02/10/2023, 5:52 PM

Hi @salmon-refrigerator-32115, the User Guide example for creating your own task plugin actually has a sensor example! https://docs.flyte.org/projects/cookbook/en/latest/auto/core/extend_flyte/user_container.html You can customize it for your own use case, e.g. you can configure it with

FlyteRemote

to wait for a task/workflow execution to complete instead of a file in s3 (which is what the example does)

freezing-airport-6809

02/10/2023, 6:12 PM

also @salmon-refrigerator-32115 some fun stuff is coming soon - infact there is an RFC, this would make your sensors extremely efficient!!! https://hackmd.io/@pingsutw/B1a_Bnfqi

salmon-refrigerator-32115

02/10/2023, 6:58 PM

Thanks!

salmon-refrigerator-32115

02/17/2023, 7:25 PM

@freezing-airport-6809, RE: Or use - external trigger from lambda Could you elaborate or provide the URL to documentation? Thanks

freezing-airport-6809

02/17/2023, 7:53 PM

https://flyte.org/blog/build-an-event-driven-neural-style-transfer-application-using-aws-lambda

salmon-refrigerator-32115

02/22/2023, 7:01 PM

@broad-monitor-993, Could you point me to an example where I can customize a task plugin to configure it with

FlyteRemote

to wait for a task from a different workflow execution to complete? Thanks!

freezing-airport-6809

02/23/2023, 5:37 AM

what do you mean - taskplugin with flyte-remote?

freezing-airport-6809

02/23/2023, 5:37 AM

hey Frank, let me connect with you

freezing-airport-6809

02/23/2023, 5:38 AM

i would love to understand what you want to do

freezing-airport-6809

02/23/2023, 5:38 AM

and I think i can help do that

freezing-airport-6809

02/23/2023, 5:38 AM

maybe directly as a backend pluging

freezing-airport-6809

02/23/2023, 5:38 AM

@salmon-refrigerator-32115 i understand you want to wait for a workflow to finish to launch another workflow. do you know how would you know the

execution id

? to wait for?

freezing-airport-6809

02/23/2023, 5:39 AM

do you want to pass the execution id as an input?

salmon-refrigerator-32115

02/23/2023, 5:25 PM

@freezing-airport-6809, That’s a very good point. Since both workflows are scheduled jobs (via Launch Plans), one would not know the other’s specific run’s execution id, rather it will know the schedule of the other and therefore the time delta between the two runs. Checking time delta is the mechanism in Airflow’s ExternalTaskSensor. And ExternalTaskSensor is so widely used. I believe Flyte could do the same and fill the gap.

freezing-airport-6809

02/23/2023, 5:34 PM

Hmm tbh I find it very confusing

freezing-airport-6809

02/23/2023, 5:34 PM

And reproducibility suffers

freezing-airport-6809

02/23/2023, 5:34 PM

But let me think what we can do - that just works

freezing-airport-6809

02/23/2023, 5:34 PM

Cc @thankful-minister-83577

millions-queen-16335

04/28/2023, 9:38 AM

I am investigating a setup that relates to this as well. - Potentially I am just missing some pieces, so I'll just paint a picture.. Imagine Team A collects data with their own launch-plan, workflow and tasks. Then imagine that Team runs their own workflow, but one task depends on the output of a task in A's workflow. Now, if A realized that their input data, or their processing of said data, is bad, we would like to re-run the entirety of "downstream" from the "bad" task. This could be all of A's and all of B's tasks (in Airflow tied to A via a name+time-reference) I think this scenario is one of the major benefits of using a workflow orchestrator. Cross-team cross-workflow is super unruly and will be heavy to manage, if it can't be explicitly orchestrated. Airflow's "rerun everything up/downstream and/or future/past executions" functionality is a huge time saver. It is easy to understand and helps solve all the "I fixed an issue with task x up to/from time y" scenarios. EDIT: Ketan I think your new backfill feature hits at least some of these areas. Maybe you already know how to solve the problem 😉

162 Views

Open in Slack

Previous Next