Flyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.

Flyte

hello - i'm interested in knowing scaling performance benchmarks, especially wrt RDBMS setups. Ideally both capacity and latency  measurements qualified in both task and workflow dimensions.  Does exist? FWIW I'm evaluating for target of 20-40M tasks/day.

I would love to hop on a call and discuss

<@U04TJR2CA7R> so the design of Flyte is such that it should scale. The real bottleneck is etcD - rate of updates - so we suggest flytepropeller sharding, right sizing kubeclient config and then the other part is the event rate and event store which is postgres. The problem with postgres is as the number of rows increase, inserts may slow down and lookups may slow down. So partitionkeys might be important.

But, tasks types also matter. For example Flyte auto compresses map tasks to be way more optimal - only really takes one row in the db and in etcD is a bitmap.
To scale i suggest - add a large db, and use multiple k8s clusters.

Also if you do not want to worry about this - we at Union would love to work with you :heart:

Also in Union we have a way in which we can re-use existing containers, this dramatically improves performance and scale of k8s