Looking for some advice for integrating a task into a Flyte Flyte #flyte-support

Looking for some advice for integrating a task int...

helpful-shampoo-3525

10/02/2023, 5:24 PM

Looking for some advice for integrating a task into a Flyte workflow. This task that is currently initiated by cron. This task scales horizontally with the number of messages in an SQS queue. I am currently using KEDA and AWS ASG of spot instances to achieve this. There can be > 10k messages in the queue when the task is started. This task does ~30 seconds setup when started. I was initially looking at using map_task to run this, but given the number of messages and duration of setup, I am thinking this is not a good fit. Thoughts? Thanks!

magnificent-teacher-86590

10/02/2023, 6:26 PM

interesting, so if its a cron job does it trigger >10k message at a given time? i think map_task is probably better than dynamic in that case.

magnificent-teacher-86590

10/02/2023, 6:26 PM

also how compute intensive are they, if its light is can it be just done inside a single task or batch them and process multiple messages in a single task

helpful-shampoo-3525

10/02/2023, 6:48 PM

A separate system is continuously populating the queue, but the cron job that kicks off the task runs hourly. Each message represents multiple "units of work" (essentially pointers to objects in s3). The message is packed with "units of work" up to the 256k message size limit. The workload is very cpu intensive.

magnificent-teacher-86590

10/02/2023, 7:56 PM

i see, in that case i think map task should work well over dynamic task. @thankful-minister-83577 do you have any insight? is map_task faster to setup?

thankful-minister-83577

10/02/2023, 9:44 PM

i don’t think i’ve seen this use-case before, but what’s the concern with map task? the start-up duration doesn’t matter that much right? you can just spin up like 5-10 tasks to read the 10k right?

thankful-minister-83577

10/02/2023, 9:45 PM

i think the issues that can arise are more nuanced in this case - what’s the stopping criteria? will new messages hit the queue while you’re processing them, are there any concerns around concurrency of multiple messages being processed at the same time, etc

thankful-minister-83577

10/02/2023, 9:46 PM

what’s the use case btw?

freezing-airport-6809

10/02/2023, 10:44 PM

@helpful-shampoo-3525 I would implement a Flyte Agent or a python task, that is up reading from a queue and working on it

freezing-airport-6809

10/02/2023, 10:45 PM

there would be no KEDA, in this case, as Flyte does not integrate with it today. We plan to in the future

freezing-airport-6809

10/02/2023, 10:45 PM

FYI @high-park-82026 / @hallowed-mouse-14616 (no comments needed, just FYI)

helpful-shampoo-3525

10/03/2023, 2:00 AM

@thankful-minister-83577, I should have been more specific the external system is producing a stream of data, but writing to the queue in bulk at the start of each hour. The use case is can be thought of as stream processing. The consumers of our data can tolerate ~1 hour of delay before data is available to them. The setup described above ends up being very reliable and cost-effective. I appreciate the responses!

helpful-shampoo-3525

10/03/2023, 2:00 AM

@freezing-airport-6809, thanks! Flyte Agents look interesting and would do most of what I am looking for. Also good to know about potential integration with KEDA in the future.

high-park-82026

10/05/2023, 4:46 PM

Late to the party, I think what you, @helpful-shampoo-3525, have built is just fine! You are using KEDA to scale up based on pod events, right? an integration with flyte isn't strictly required but of course it can speed up scaling the cluster so it's ready for when the pods land. Are there concerns about partial failures? What if a pod starts, reads a message, processes a few files before some catastrophic event happens (e.g. the spot machine disappears or the pod crashes)? Are the operations that pod perform idempotent (ideally)?

3 Views

Open in Slack

Previous Next