Hi everyone, I am an NLP scientist at Enterpret. Currently we are looking to scale our processes. We are looking for tools that abstract Infra and Compute management. We are evaluating based on the following points:
• There are multiple tasks in a single flow.
• Data needs to be transferred from one task to another task.
• Data that needs to be transferred can be huge. So persistance of data at each step of flow (and / or) at the end of the flow is needed.
• Should be able to run each task locally manually (For experimentation / debugging purposes) and on cloud (for scaling purposes)
• Should be able to run tasks parallely that are independent to each other in a flow.
• Should be able to run flows parallely -> Could be useful for running different experiments at same time
• Having the ability to monitor the progress of task in a flow, since tasks can take more time
• Compute needed for each task in a flow can be different. Ex:Training needs GPU, Data Gathering can work on CPU
• Compute used should scale down when no flow is running.
• Each step in a flow can have different dependencies.
• Learning curve should be less so that data scientists can feel less overwhelmed.
I have evaluated Metaflow. But configuring it took a lot of time. Packaging of custom built code is difficult.
I was evaluating AWS Sagemaker but the complexity is too high. Can some one help me in understanding the differences between Flyte and Sagemaker. Pros and Cons of both so that I can explain to my team.
Would Flyte covers my usecases?
I also write blogs on MLOps. Interested in contributing as well. https://ravirajag.dev/
hi @Raviraja Ganta reading through your usecases, and I do think Flyte is a very good fit (I am biased 😃)
Also firstly, welcome to the community. We are more than excited to have you here and giving Flyte a chance 🙂
Also Blogging about Flyte sounds amazing
I would love to catch up and help you answer the differences.For Flyte vs Sagemaker.
Flyte actually can integrate with Sagemaker, but Flyte also uses Kubernetes by default
Moreover, Flyte is also type-safe, data aware orchestration system that is designed to make it easy to write complex, large pipelines and author them in Python, Java or ScalaIdea of type-safety is to find errors ahead of time - even before running.
It also support memoization and recoverability from disasterous failuresIt is also designed to be incremental, so you can adopt it slowly and expand the usecases.Many features are designed around ML and it solves problems like distributed training, connecting disparate technologies like spark and distributed training with simple single process compute.
In essence we want to provide a serverless experience for the end-users, by cleanly separating the infrastructure responsibilities and the user code
I can DM you
5 months ago
Hi Ketan.. thanks for explanation.
4 months ago
Hey @Raviraja Ganta Just checking in. How is your Flyte excursion coming along. BTW, I really like you blog. Great on visuals. I would love to chat. I will DM you if you are open to it.