Hey folks currently investigating Flyte as an alternative to Flyte #announcements

Hey folks, currently investigating Flyte as an alt...

elegant-australia-91422

04/18/2022, 7:08 PM

Hey folks, currently investigating Flyte as an alternative to our current setup which uses Kubeflow/Argo -- we have several pipelines with a very high degree of fan-out (on the order of ~700-1000 individual tasks). Does anyone have data points/benchmarks on how Flyte scales wrt the size of the workflow? Many of the features in the SDK (especially how flytekit works w/ native python types) look quite promising and could eliminate a lot of workarounds we've implemented to use Kubeflow.

hallowed-mouse-14616

04/18/2022, 7:33 PM

Hey @elegant-australia-91422, this is a very good question. I can say a few things. Currently Flyte stores workflows as k8s CRDs, so the 1.5MB limit on etcd means that workflow size is bounded. Generally, this means a single workflow can fanout to 10k+, but this is not a hard number as it is contingent on task ID lengths, etc.

👍 1

hallowed-mouse-14616

04/18/2022, 7:34 PM

That being said, through the use of launch plans you can break a workflow into multiple CRDs in the backend. So be using separate tiers a single workflow is essentially unbounded in size.

elegant-australia-91422

04/18/2022, 7:35 PM

Ah fantastic -- to confirm, does this mean a subworkflow will use a separate CRD? Also, Argo supports offloading large workflows to an external DB to get around the 1.5MB limit for K8s resources; does Flyte support something similar?

hallowed-mouse-14616

04/18/2022, 7:36 PM

If you launch a subworkflow directly it will not. If launch a subworkflow using a launchplan it will. This is terminology we are certainly trying to clarify as ti impact is not clear.

elegant-australia-91422

04/18/2022, 7:37 PM

Gotcha, good to know. I'm planning on running some stress tests to compare vanilla Argo, Kubeflow (on Argo), Metaflow (on Argo) and Flyte soon so will keep that in mind when doing so.

hallowed-mouse-14616

04/18/2022, 7:38 PM

Currently we are exploring options to offload to an external DB (actually hoping to transition completely away from etcd because of the aforementioned limitations). Maybe starting with large workflows only is a great stepping stone.

elegant-australia-91422

04/18/2022, 7:38 PM

https://argoproj.github.io/argo-workflows/offloading-large-workflows/ you might find this interesting

👍 1

hallowed-mouse-14616

04/18/2022, 7:40 PM

Oh absolutely, please reach out with any questions! And we would love to see the stress results. We're seeing a lot of interesting in these large scale. So there is plenty of interesting work to do in support.

🙏 1

freezing-airport-6809

04/18/2022, 8:30 PM

also @elegant-australia-91422 & @hallowed-mouse-14616 we do support a concept of Map tasks, that are built specifically for fanouts - Examples here

freezing-airport-6809

04/18/2022, 8:30 PM

there are cases in which we have one workflow with multiple map tasks, each can have 1000's (maybe around 5k) tasks each

freezing-airport-6809

04/18/2022, 8:30 PM

in the backend they are optimized for low overhead storage, so hence this can allow you to scale quite a bit

elegant-australia-91422

04/18/2022, 8:31 PM

That sounds like exactly what we're looking for (fwiw, our use-case is backtesting models w/ quarterly data, which requires running a full model training pipeline for each period's worth of data)

freezing-airport-6809

04/18/2022, 8:31 PM

ya we used to do that all the time

freezing-airport-6809

04/18/2022, 8:31 PM

Rahul let me know if you want to have a quick chat

elegant-australia-91422

04/18/2022, 8:32 PM

Happy to, will DM you.

freezing-airport-6809

04/18/2022, 8:32 PM

also, you will really like - Caching (memoization)

freezing-airport-6809

04/18/2022, 8:32 PM

and recover mode

183 Views

Open in Slack

Previous Next