Hi community, As a company, our workflows are very...
# ask-the-community
t
Hi community, As a company, our workflows are very modular- they consist of several cached tasks that run in a chain and should be done in a few minutes. We are concerned about pod overhead; The trade off between modularity -the number of tasks in a workflow that run sequentially and workflow run time - each new task requires a new pod and this could be time costly. It would have been awesome if sequential tasks that require the same environment could run on the same pod without the pod overhead of launching new tasks (while task caching happens along the way). Any feature coming up/ existing features that answer this need? Any personal insights?
k
This is in progress as we speak- we are working on it. If results are cached that should be fast
Our goal has been to build reproducible - shareable pipelines first - making them go as fast as they can is a long term objective of this project and we will keep doing it
But @Tom Touati we would love to learn more about your usecases to see how things are building should align
t
@Ketan (kumare3) Hey Ketan, We are a company that samples brain data using an original headset, which then undergoes different analyses: • The data (originally in s3) undergoes different signal preprocessing steps (tasks) that require efficient hyper parameter optimization, with flexibility of repeating steps, skipping steps, and changing step order. • EDA - A single preprocessed dataset can be assessed in a number of different workflows that extract and visualize the signal quality and brain reaction. (this are called "single analyses") • Group analyses are workflows that run single analyses on several groups of datasets and show a comparison between groups through different statistics and visualization. • ML models are developed as workflows that have feature extraction, feature engineering, feature selection, training and testing tasks. We only have one workflow as such that we are integrating to flyte these days. Flyte results are also being ingested by a database that is connected to BI dashboards.
k
are you using Flyte in EDA and interactive analysis and thats where you would like more speed?
t
@Ketan (kumare3) Yeah but more basically, we need workflows which are highly flexible, have several tasks that run in parallel and in a chain, which are short running (a few minutes). Basically our issue is that we want each logical component of our flow to be a task, but sometimes these tasks are short and therefore. not worth the pod overhead. For example, we tried to use eager workflows in order to execute some python code outside of a task. If our flow consists of a, b, c logically, we implemented a and c as tasks, while b is short and not worth the pod overhead of a new task, which led us to try to implement b outside of tasks in the workflow code using eager workflows. We ran into some issues with that, so this means that we will join a and b in a single task in order to not make b into a task of it's own. This means that the logical aspect of our flow is not fully congruent with what we want.
k
cc @Niels Bantilan / @Haytham Abuelfutuh / @Dan Rammer (hamersaw) @Tom Touati you are right, today Flyte cannot reuse pods. We do have something in progress, but not ready to reveal yet 🙂 - stay tuned! We would love to learn more about the problems with Eager and also completely understand. Once the Flyte paradigm works, it can be used for a lot of usecases. We are just choosing our specializations today.
n
b is short and not worth the pod overhead of a new task, which led us to try to implement b outside of tasks in the workflow code using eager workflows. We ran into some issues with that
if you can, please share minimal reproducible code that we can help debug
t
@Ketan (kumare3) Looking forward to new stuff 🙂 We love Flyte. @Niels Bantilan Sadly we don't have that code anymore, but the engineer that handled the eager workflows implementation reported that it was really hard to debug type mismatches, even more than regular workflows (Which sadly, is generally my personal time bottleneck with Flyte- the errors aren't clear enough and require manual investigation).
k
We want to continue doing cool stuff but for that I plead please share if you love the product- write a blog
t
@Ketan (kumare3) Where?
n
@Tom Touati I don’t think we have specific opinions on where, it could be your personal blog, company blog, or even the flyte blog! Whichever you prefer, but we can help amplify in slack and other socials @Samhita Alla @David Espejo (he/him)
s
We also hold biweekly community meetings. If you'd like to share your Flyte journey, let us know!