Hey we're looking for a workflow orchestrator to b...
# flyte-support
c
Hey we're looking for a workflow orchestrator to build some of our data pipelines. We were originally on Prefect, but encountered several bugs/limitations, so we're looking to switch. Currently, we're currently debating between dagster, restate, and flyte, but there aren't many resources to help us differentiate between the three. Some more context on our needs: * Backfill operation on several sources of data (not very demanding, just ~100gb) + some simple data transformations afterwards * Workflow oriented jobs to process our data further (example: file download/OCR/screenshots). This is currently very demanding. Each job varies in length (around 5-10 minutes), and we need to run 100s of them at the same time. These jobs need to run both as a backfill (totaling a few hundred thousand jobs over a few days), and afterwards continuously every day (~100-200) or so
s
Hey Charles, I'm a PM at Union and also focus a lot on Flyte. Could you describe what issues/limitations you had with Prefect? This might help answer whether Flyte could be more effective in those specific areas.
c
Hey John! Thanks for the quick response. The biggest problem we encountered with Prefect was that it seemed to be really buggy with long running jobs. It would lose connection to our Azure containers randomly, and when we cancelled jobs or jobs failed, a few tasks/flows would persist that we would have to manually clean up ourselves. Other than that, we also had some issues with reliability & traceability. We weren't able to retry individual tasks or store metadata of the task directly, and relied solely on logs to keep track of that.
Also, very curious as to what use cases Flyte would be more appropriate than Dagster or Temporal or vice versa.
s
Flyte is pretty well-equipped to handle long-running jobs. It's k8s-native, so it runs the containers natively alongside the orchestrator in k8s. I haven't heard of folks experiencing a workflow that loses connections to pods before. Flyte really shines on the recoverability aspect. It saves intermediate inputs and outputs in an object store automatically, so if a task fails at the end of a workflow, you can click "recover" and it will hydrate the inputs to the failed task and continue on. There are also features around: • task-level retriesintra-task checkpointingcaching
The main difference between Flyte and prefect/dagster/airflow/other pure orchestrators is that it integrates the infrastructure. This architecture has 2 main benefits: • You don't need to manage 2 tools (i.e. prefect for orchestration, step functions/batch/k8s for compute) • It's more efficient, since the orchestrator knows about all the compute that is available. So you'll end up with higher utilization since you can run multiple tasks from different workflows concurrently on the same compute nodes
@freezing-airport-6809 may want to comment further 🙂
c
I've heard comments that temporal is also really well suited for use cases that have more complex workflows and not just the data piece like prefect/dagster. How would you say Flyte compares with temporal with regard to that?
s
Flyte is also built to be more scalable - whereas IIRC some other tools listed are doing the orchestration in a Python runtime, which can't scale beyond a certain point, Flyte is running in Go in the backend and is built to run hundreds of thousands of concurrent processes
I don't know quite as much about Temporal, but as I understand it, Temporal is great for big-scale microservice orchestration. But they don't deal in data. The maximum amount of data you can pass between steps of your workflow is quite small. You can of course manually offload and onload data from an object store within each step, but this can be cumbersome. Flyte kind of "just works" in this respect: https://flyte-next.readthedocs.io/en/auto-gen-toc/concepts/data_management.html
f
@curved-stone-6118 reading your use case - I do feel Union / Flyte is perfect for this. As compared to temporal - you do not have to manage infra, you get versioning, automatic cluster scaling up and down on demand, pythonic workflows, containerizations, resource isolation, resource targeting like - cpu vs GPU. Flyte is more of an infrastructure platform and temporal a workflow engine. Workflows are a part of the thing that flyte offers, but does quite a bit more
try it once - and a quick way might be to use union serverless - signup.union.ai
cc @high-park-82026
h
Hey @curved-stone-6118 great to meet you! I agree with all of the above, I think to me the fundamental difference between Flyte and Temporal is the responsibilities each assumed and subsequently what that says about the user's responsibility. Temporal is a workflow engine. It, very well, scales horizontally and help maximize resource utilization of the engine. It offers durable executions and historical data about executions and that's where its responsibilities end. This assumes the user is responsible for packaging their code as services, the user is responsible for scaling these services and accounting for surge traffic... etc. This is why this works very well for microservices orchestration where, once in production, the load is predictable and the patterns can be monitored. On the other hand, it requires users to think of their code as a set of services from the get go... Flyte, on the other hand, encapsulates a powerful workflow engine but also assumes the responsibility of managing the entire infrastructure (thanks to it being k8s-native from day 1). This allows the user to write more intuitive code (function
a
calls functions
b
and
c
and passes data between them). The code can run and be debugged locally because it's just your native python code, functions/tasks can have different dependencies and docker images if needed... etc. and Flyte will take care of spinning up the needed Pods to execute the workloads it's asked to execute on demand and scale down to 0 when everything is done. Disclaimer: I work for Union the company sponsoring and maintaining Flyte. Happy to jump on a call to hear more about what you are trying to build and answer questions.