There are several good comparison articles between Flyte and Airflow, but I haven't seen many comparisons to other orchestrators. I'm particularly interested in Dagster. Any plans to add something like this to a FAQ or blog post?
Airflow and Kubeflkow are quite different to Flyte
I'm particularly interested in Dagster because there are more commonalities
But probably also a lot of differences
12/15/2022, 12:34 AM
@Adrian Garcia Badaracco there are a lot of differences with Dagster. We are more vertically integrated compute
cc @Pradithya Aria Pura can you help here.
Flyte was designed with ephmereral infrastructure in mind, it can definitely work for data workflows, but it really shines where there is ML in the picture
The infrastructure handling in Flyte is far superior in our opinion
Pradithya Aria Pura
12/15/2022, 12:47 AM
In term of low level user API (task, workflow, op), I think it's quite similar. But dagster recently added higher level api that they call software defined asset. It's more opinionated but some might find it useful, especially for ETL. I think it's also doable in Flyte.
I agree with @Ketan (kumare3), the key differentiator of Flyte is its scalability and plugability. It's hard to achive multitenancy in dagster (at least from what I experience a while back) and plugin can only be done in front end as python task. In flyte there is backend plugin concept.
Adrian Garcia Badaracco
12/15/2022, 1:23 AM
Gotcha, that is a helpful differentiation. I does seem that Dagster is generally much more Python centric whereas Flyte tries to be language agnostic.
How about in terms of executing users code? It seems like Dagster expects you to start a container which starts a gRPC server that the central scheduler hits to discover all of that tasks/workflows/ops and then runs each op by starting a new pod (on k8s at least) using the same image but this time hits the gRPC server to say "run this op with these parameters".
Pradithya Aria Pura
12/15/2022, 1:30 AM
Oh yeah that part feels clunky to me. Is it still like that? All users code are stored in a deployment instead of persistent storage. And to have different set of dependency then you'll need separate deployment.