There are several good comparison articles between...
# ask-the-community
a
There are several good comparison articles between Flyte and Airflow, but I haven't seen many comparisons to other orchestrators. I'm particularly interested in Dagster. Any plans to add something like this to a FAQ or blog post?
s
We have one comparing Flyte with Kubeflow: https://www.union.ai/blog-post/production-grade-ml-pipelines-flyte-vs-kubeflow. We definitely can put together a doc to list out the differences between Flyte and Dagster but that might take some time. @Martin Stein @Ketan (kumare3), what do you think?
a
Airflow and Kubeflkow are quite different to Flyte
I'm particularly interested in Dagster because there are more commonalities
But probably also a lot of differences
k
@Adrian Garcia Badaracco there are a lot of differences with Dagster. We are more vertically integrated compute
cc @Pradithya Aria Pura can you help here.
Flyte was designed with ephmereral infrastructure in mind, it can definitely work for data workflows, but it really shines where there is ML in the picture
The infrastructure handling in Flyte is far superior in our opinion
p
In term of low level user API (task, workflow, op), I think it's quite similar. But dagster recently added higher level api that they call software defined asset. It's more opinionated but some might find it useful, especially for ETL. I think it's also doable in Flyte. I agree with @Ketan (kumare3), the key differentiator of Flyte is its scalability and plugability. It's hard to achive multitenancy in dagster (at least from what I experience a while back) and plugin can only be done in front end as python task. In flyte there is backend plugin concept.
a
Gotcha, that is a helpful differentiation. I does seem that Dagster is generally much more Python centric whereas Flyte tries to be language agnostic.
How about in terms of executing users code? It seems like Dagster expects you to start a container which starts a gRPC server that the central scheduler hits to discover all of that tasks/workflows/ops and then runs each op by starting a new pod (on k8s at least) using the same image but this time hits the gRPC server to say "run this op with these parameters".
p
Oh yeah that part feels clunky to me. Is it still like that? All users code are stored in a deployment instead of persistent storage. And to have different set of dependency then you'll need separate deployment.
1100 Views