Hello, In Airflow there is a mechanism for a dag t...
# ask-the-community
f
Hello, In Airflow there is a mechanism for a dag to wait until another dag has successfully completed (ExternalTaskSensor), then it will start. What is the mechanism in Flyte that achieves this?
k
You can write your own sensor
Or use - external trigger from lambda
n
Hi @Frank Shen, the User Guide example for creating your own task plugin actually has a sensor example! https://docs.flyte.org/projects/cookbook/en/latest/auto/core/extend_flyte/user_container.html You can customize it for your own use case, e.g. you can configure it with
FlyteRemote
to wait for a task/workflow execution to complete instead of a file in s3 (which is what the example does)
k
also @Frank Shen some fun stuff is coming soon - infact there is an RFC, this would make your sensors extremely efficient!!! https://hackmd.io/@pingsutw/B1a_Bnfqi
f
Thanks!
@Ketan (kumare3), RE: Or use - external trigger from lambda Could you elaborate or provide the URL to documentation? Thanks
f
@Niels Bantilan, Could you point me to an example where I can customize a task plugin to configure it with
FlyteRemote
to wait for a task from a different workflow execution to complete? Thanks!
k
what do you mean - taskplugin with flyte-remote?
hey Frank, let me connect with you
i would love to understand what you want to do
and I think i can help do that
maybe directly as a backend pluging
@Frank Shen i understand you want to wait for a workflow to finish to launch another workflow. do you know how would you know the
execution id
? to wait for?
do you want to pass the execution id as an input?
f
@Ketan (kumare3), That’s a very good point. Since both workflows are scheduled jobs (via Launch Plans), one would not know the other’s specific run’s execution id, rather it will know the schedule of the other and therefore the time delta between the two runs. Checking time delta is the mechanism in Airflow’s ExternalTaskSensor. And ExternalTaskSensor is so widely used. I believe Flyte could do the same and fill the gap.
k
Hmm tbh I find it very confusing
And reproducibility suffers
But let me think what we can do - that just works
Cc @Yee
m
I am investigating a setup that relates to this as well. - Potentially I am just missing some pieces, so I'll just paint a picture.. Imagine Team A collects data with their own launch-plan, workflow and tasks. Then imagine that Team runs their own workflow, but one task depends on the output of a task in A's workflow. Now, if A realized that their input data, or their processing of said data, is bad, we would like to re-run the entirety of "downstream" from the "bad" task. This could be all of A's and all of B's tasks (in Airflow tied to A via a name+time-reference) I think this scenario is one of the major benefits of using a workflow orchestrator. Cross-team cross-workflow is super unruly and will be heavy to manage, if it can't be explicitly orchestrated. Airflow's "rerun everything up/downstream and/or future/past executions" functionality is a huge time saver. It is easy to understand and helps solve all the "I fixed an issue with task x up to/from time y" scenarios. EDIT: Ketan I think your new backfill feature hits at least some of these areas. Maybe you already know how to solve the problem 😉
152 Views