Hey folks We have a use cases to trigger Flyte workflows whe Flyte #flyte-support

Hey folks! We have a use cases to trigger Flyte wo...

creamy-policeman-62284

08/27/2024, 8:43 PM

Hey folks! We have a use cases to trigger Flyte workflows when some external dependency is ready, for example, a Iceberg/Parquet dataset was updated or there's a new event. Is there a way to configure Flyte's scheduler to listen to these events? From the docs, it looks like this use case not supported atm.

thankful-minister-83577

08/27/2024, 8:50 PM

this is not currently possible in the general case. this is partially supported in the Union deployed variant of flyte but not the oss platform by itself.

thankful-minister-83577

08/27/2024, 8:51 PM

the reason for this though is mostly because this is largely an orthogonal exercise… what you’re describing in the general case is a webhook interface, that can watch arbitrary files, and launch tasks & workflows as a result. the actual kicking off of a workflow is nothing more than a grpc call with the provided client.

thankful-minister-83577

08/27/2024, 8:51 PM

the webhook system would need to offer a seamless way to customize the watching of files across any number of sources, handle long-lived authentication/authorization, etc.

thankful-minister-83577

08/27/2024, 8:52 PM

to achieve what you want with the tools available today, I think you should look at https://docs.flyte.org/en/latest/deployment/agents/sensor.html#deployment-agent-setup-sensor

thankful-minister-83577

08/27/2024, 8:52 PM

sensor tasks, if you follow them up with other tasks, are a way to respond to files and other resources landing

thankful-minister-83577

08/27/2024, 8:53 PM

see https://github.com/flyteorg/flytesnacks/blob/master/examples/sensor/sensor/file_sensor_example.py

creamy-policeman-62284

08/27/2024, 8:54 PM

Thanks! The Sensor task seems to cover the use case. Curious, what's not supported yet?

thankful-minister-83577

08/27/2024, 8:55 PM

i guess beyond the semantics of it… it’s not really reactive. that’s probably the best way to describe it

thankful-minister-83577

08/27/2024, 8:55 PM

you, or a schedule, still have to kick off the workflow.

thankful-minister-83577

08/27/2024, 8:56 PM

like it’s still a different paradigm from the fully-featured triggering/webhook system you might imagine

thankful-minister-83577

08/27/2024, 8:57 PM

but again, it’s kinda hard to justify building that as a flyte thing

creamy-policeman-62284

08/27/2024, 8:57 PM

Is it fair to say that this seems very similar to Airflow? The DAG schedules the workflow and within the task users have specify the upstream task they want to depend on.

thankful-minister-83577

08/27/2024, 8:58 PM

possibly? i mean it wasn’t copied from airflow, but if the semantics feel familiar then so much the better.

creamy-policeman-62284

08/27/2024, 9:00 PM

Fair. Would you know what will happen in this case: say a Flyte wf is scheduled to run every hour and it depends on some S3 file to be present. For whatever reason, the S3 file creation is delayed by 2 hours. 2 hours later, will there be 2 workflow instances running?

glamorous-carpet-83516

08/27/2024, 9:00 PM

yup, you could create a custom sensor that watch those events. https://github.com/flyteorg/flytekit/blob/master/flytekit/sensor/file_sensor.py#L5-L14

👀 1

glamorous-carpet-83516

08/27/2024, 9:03 PM

2 hours later, will there be 2 workflow instances running?

for now yes, but there is a RFC about execution concurrency that aims to address this issue. https://github.com/flyteorg/flyte/pull/5659/files

thankful-minister-83577

08/27/2024, 9:05 PM

basically if you know how many files will land, or if there is a regular cadence, then you can use this approach.

creamy-policeman-62284

08/27/2024, 9:06 PM

These resources are super helpful! I have few more question

this [event based scheduling] is partially supported in the Union deployed variant of flyte but not the oss platform by itself.

Curious to understand what's the gap in Sensors

thankful-minister-83577

08/27/2024, 9:06 PM

but if the goal is to react to an unknown number of events, then this pattern becomes a lot harder. you’d basically have to kick off a new execution at the end of each one.

thankful-minister-83577

08/27/2024, 9:08 PM

Union does provide a triggering mechanism (that can handle this unknown-count case), but the triggering mechanism is through the higher order construct of an Artifact. To complete the full system you probably have in mind, you’d need a lambda function or something that can take arbitrary incoming files/tables/etc, and create an artifact instance out of them

creamy-policeman-62284

08/27/2024, 9:12 PM

Got it! Is there a public facing tracker to track this and RFC implementation?

thankful-minister-83577

08/27/2024, 9:14 PM

well the union effort is closed source only (because of the reliance on additional infrastructure).

thankful-minister-83577

08/27/2024, 9:17 PM

the amount of additional infrastructure, and the additional support burden on flyte contributors, make it a hard sell i think for such a system to be bundled in with core flyte. but over time, who knows, maybe it will make sense to develop something. currently there is not an rfc. feel free to start an rfc incubator discussion though. maybe over time ideas can be collected there.

creamy-policeman-62284

08/27/2024, 9:22 PM

Oh you are referring to this. Another tactical question, I wonder if I can wire multiple sensors. For example,

sensor(file_1_ready) && sensor(file_2_ready) >> first_task

file_1

file_2

sensors aren't true at the same time then

first_task

won't be triggered, ie, both files should land at same time and both sensors should evaluate at same time. Is my understanding right?

thankful-minister-83577

08/27/2024, 9:25 PM

i think you can just chain them… but yeah, or

Copy code

s1 = sensor(...)
s2 = sensor(...)
s1 >> t1
s2 >> t1

check that but i think that also works

creamy-policeman-62284

08/27/2024, 9:25 PM

Interesting, I'd have read this as, if either s1 or s2 are true then trigger t1 🤔

thankful-minister-83577

08/27/2024, 9:26 PM

both will block t1, the >> operator just creates an artificial dependency when there is no data dependency

creamy-policeman-62284

08/27/2024, 9:29 PM

Got it. Last question, why is the scheduling considered orthogonal to Flyte's use case?

thankful-minister-83577

08/27/2024, 9:34 PM

scheduling is not… normal cron scheduling is built in

creamy-policeman-62284

08/28/2024, 2:38 PM

Ah - I see. Event based triggering is orthogonal.

creamy-policeman-62284

08/28/2024, 2:38 PM

Thank you for all the resources.

60 Views

Open in Slack

Previous Next