Hi community As a company our workflows are very modular the Flyte #flyte-support

Hi community, As a company, our workflows are very...

best-oil-18906

10/29/2023, 9:52 AM

Hi community, As a company, our workflows are very modular- they consist of several cached tasks that run in a chain and should be done in a few minutes. We are concerned about pod overhead; The trade off between modularity -the number of tasks in a workflow that run sequentially and workflow run time - each new task requires a new pod and this could be time costly. It would have been awesome if sequential tasks that require the same environment could run on the same pod without the pod overhead of launching new tasks (while task caching happens along the way). Any feature coming up/ existing features that answer this need? Any personal insights?

freezing-airport-6809

10/29/2023, 4:42 PM

This is in progress as we speak- we are working on it. If results are cached that should be fast

freezing-airport-6809

10/29/2023, 4:42 PM

Our goal has been to build reproducible - shareable pipelines first - making them go as fast as they can is a long term objective of this project and we will keep doing it

freezing-airport-6809

10/29/2023, 4:47 PM

But @best-oil-18906 we would love to learn more about your usecases to see how things are building should align

best-oil-18906

10/30/2023, 10:17 AM

@freezing-airport-6809 Hey Ketan, We are a company that samples brain data using an original headset, which then undergoes different analyses: • The data (originally in s3) undergoes different signal preprocessing steps (tasks) that require efficient hyper parameter optimization, with flexibility of repeating steps, skipping steps, and changing step order. • EDA - A single preprocessed dataset can be assessed in a number of different workflows that extract and visualize the signal quality and brain reaction. (this are called "single analyses") • Group analyses are workflows that run single analyses on several groups of datasets and show a comparison between groups through different statistics and visualization. • ML models are developed as workflows that have feature extraction, feature engineering, feature selection, training and testing tasks. We only have one workflow as such that we are integrating to flyte these days. Flyte results are also being ingested by a database that is connected to BI dashboards.

freezing-airport-6809

10/30/2023, 4:37 PM

are you using Flyte in EDA and interactive analysis and thats where you would like more speed?

best-oil-18906

11/05/2023, 9:30 AM

@freezing-airport-6809 Yeah but more basically, we need workflows which are highly flexible, have several tasks that run in parallel and in a chain, which are short running (a few minutes). Basically our issue is that we want each logical component of our flow to be a task, but sometimes these tasks are short and therefore. not worth the pod overhead. For example, we tried to use eager workflows in order to execute some python code outside of a task. If our flow consists of a, b, c logically, we implemented a and c as tasks, while b is short and not worth the pod overhead of a new task, which led us to try to implement b outside of tasks in the workflow code using eager workflows. We ran into some issues with that, so this means that we will join a and b in a single task in order to not make b into a task of it's own. This means that the logical aspect of our flow is not fully congruent with what we want.

freezing-airport-6809

11/06/2023, 5:28 AM

cc @broad-monitor-993 / @high-park-82026 / @hallowed-mouse-14616 @best-oil-18906 you are right, today Flyte cannot reuse pods. We do have something in progress, but not ready to reveal yet 🙂 - stay tuned! We would love to learn more about the problems with Eager and also completely understand. Once the Flyte paradigm works, it can be used for a lot of usecases. We are just choosing our specializations today.

👍 1

broad-monitor-993

11/06/2023, 6:04 PM

b is short and not worth the pod overhead of a new task, which led us to try to implement b outside of tasks in the workflow code using eager workflows. We ran into some issues with that

if you can, please share minimal reproducible code that we can help debug

best-oil-18906

11/08/2023, 9:47 AM

@freezing-airport-6809 Looking forward to new stuff 🙂 We love Flyte. @broad-monitor-993 Sadly we don't have that code anymore, but the engineer that handled the eager workflows implementation reported that it was really hard to debug type mismatches, even more than regular workflows (Which sadly, is generally my personal time bottleneck with Flyte- the errors aren't clear enough and require manual investigation).

❤️ 1

freezing-airport-6809

11/08/2023, 3:22 PM

We want to continue doing cool stuff but for that I plead please share if you love the product- write a blog

best-oil-18906

11/09/2023, 11:20 AM

@freezing-airport-6809 Where?

broad-monitor-993

11/09/2023, 1:58 PM

@best-oil-18906 I don’t think we have specific opinions on where, it could be your personal blog, company blog, or even the flyte blog! Whichever you prefer, but we can help amplify in slack and other socials @tall-lock-23197 @average-finland-92144

👋 1

tall-lock-23197

11/09/2023, 3:47 PM

We also hold biweekly community meetings. If you'd like to share your Flyte journey, let us know!

➕ 1

4 Views

Open in Slack

Previous Next