Hey folks, currently investigating Flyte as an alt...
# announcements
r
Hey folks, currently investigating Flyte as an alternative to our current setup which uses Kubeflow/Argo -- we have several pipelines with a very high degree of fan-out (on the order of ~700-1000 individual tasks). Does anyone have data points/benchmarks on how Flyte scales wrt the size of the workflow? Many of the features in the SDK (especially how flytekit works w/ native python types) look quite promising and could eliminate a lot of workarounds we've implemented to use Kubeflow.
d
Hey @Rahul Mehta, this is a very good question. I can say a few things. Currently Flyte stores workflows as k8s CRDs, so the 1.5MB limit on etcd means that workflow size is bounded. Generally, this means a single workflow can fanout to 10k+, but this is not a hard number as it is contingent on task ID lengths, etc.
👍 1
That being said, through the use of launch plans you can break a workflow into multiple CRDs in the backend. So be using separate tiers a single workflow is essentially unbounded in size.
r
Ah fantastic -- to confirm, does this mean a subworkflow will use a separate CRD? Also, Argo supports offloading large workflows to an external DB to get around the 1.5MB limit for K8s resources; does Flyte support something similar?
d
If you launch a subworkflow directly it will not. If launch a subworkflow using a launchplan it will. This is terminology we are certainly trying to clarify as ti impact is not clear.
r
Gotcha, good to know. I'm planning on running some stress tests to compare vanilla Argo, Kubeflow (on Argo), Metaflow (on Argo) and Flyte soon so will keep that in mind when doing so.
d
Currently we are exploring options to offload to an external DB (actually hoping to transition completely away from etcd because of the aforementioned limitations). Maybe starting with large workflows only is a great stepping stone.
r
d
Oh absolutely, please reach out with any questions! And we would love to see the stress results. We're seeing a lot of interesting in these large scale. So there is plenty of interesting work to do in support.
🙏 1
k
also @Rahul Mehta & @Dan Rammer (hamersaw) we do support a concept of Map tasks, that are built specifically for fanouts - Examples here
there are cases in which we have one workflow with multiple map tasks, each can have 1000's (maybe around 5k) tasks each
in the backend they are optimized for low overhead storage, so hence this can allow you to scale quite a bit
r
That sounds like exactly what we're looking for (fwiw, our use-case is backtesting models w/ quarterly data, which requires running a full model training pipeline for each period's worth of data)
k
ya we used to do that all the time
Rahul let me know if you want to have a quick chat
r
Happy to, will DM you.
k
also, you will really like - Caching (memoization)
and recover mode
112 Views