wave I m hoping to get a better understanding of when a wor Flyte #flytekit

:wave: I’m hoping to get a better understanding of...

lively-sundown-82704

03/23/2022, 4:14 PM

👋 I’m hoping to get a better understanding of when a workflow is considered “executed”. I have a series of workflows that are structured as follows: • Workflow A: A prerequisite workflow that does some initial data processing • Workflow B: A downstream consumer of WF A, that invokes WF A as a subworkflow in execution • Workflow C: A downstream consumer of WF A, that invokes WF A as a subworkflow in execution - has no relationship to Workflow B other than the shared dependency to A I launched Workflows B and C sequentially, and as expected: • A was invoked from B, and the Flyte console indicates that A’s output was written to cache • A was then invoked from C, and the Flyte console indicates that A’s output was read from cache However, when I look at the Flyte console for WF A, it indicates that no executions can be found. That leads me to believe that a workflow is only considered executed if it is explicitly launched by a user, and that sub-workflow executions are not assigned their own execution IDs. Is my understanding correct?

hallowed-mouse-14616

03/23/2022, 4:22 PM

Hi @lively-sundown-82704, great question! The statement "sub-workflow executions are not assigned their own execution IDs" is correct. This is actual rather complex, so I'll try to explain, but please let me know if anything doesn't make sense.

hallowed-mouse-14616

03/23/2022, 4:24 PM

When you use a subworkflow as part of another workflow (as in your example of Workflow B and C using Workflow A) Flyte compiles that subworkflow into the execution DAG. This execution can then be though of as "inline" in a sense, where the separate execution (of Workflow A) will not be tracked but all of the tasks are executed according to the dependencies.

hallowed-mouse-14616

03/23/2022, 4:26 PM

If you are looking to track a separate execution I believe that launchplans allow you to do this. The idea of a launch plan is to start a separate execution of a workflow (in your example Workflow A) and then during execution of the parent workflow (in your example Workflows B and C) they track the external execution.

hallowed-mouse-14616

03/23/2022, 4:27 PM

@freezing-airport-6809 @high-park-82026 Can you confirm that launchplans allow subworkflow executions to be tracked in the UI? ^^^

lively-sundown-82704

03/23/2022, 4:31 PM

I see, thanks for the quick response! As a follow-up, does this also mean that sub-workflows would also not emit any

WorkflowExecutionEvents

? Put differently, if we invoke A as a sub-workflow from B, can we expect to see

WorkflowExecutionEvents

emitted for workflows A and B with the same execution ID? Or would Flyte only emit

NodeExecutionEvents

for all of the tasks that define A over the course of executing B?

hallowed-mouse-14616

03/23/2022, 4:37 PM

Correct, subworkflows will not emit specific

WorkflowExecutionEvents

rather flyte will only emit events for the parent workflow, which will include

NodeExecutionEvents

. So in your example, Workflow B and C will emit

NodeExecutionEvents

for all of the tasks in Workflow A with the execution IDs of the top level execution (ie. B and C) respectively.

👍 2

hallowed-mouse-14616

03/23/2022, 4:38 PM

Again, the use of launchplans means that Workflow A will be executed as a separate execution and therefore will emit it's own

WorkflowExecutionEvents

if that is the desired behavior.

👍 2

freezing-airport-6809

03/23/2022, 4:38 PM

this is correct. Thank you for the great answer Dan

👏 1

lively-sundown-82704

03/23/2022, 4:40 PM

Yes, that is the desired behavior for our application - I’ll try to invoke A with a launchplan and let you know whether we can observe a dedicated

WorkflowExecutionEvent

/ unique execution ID for A. Really appreciate the help here, thanks!

🙌 1

freezing-airport-6809

03/23/2022, 4:40 PM

I am sure you can

freezing-airport-6809

03/23/2022, 4:40 PM

as you get a new execution ID

lively-sundown-82704

03/23/2022, 4:41 PM

ok great!

lively-sundown-82704

03/23/2022, 5:55 PM

CC @flaky-action-19778 @famous-businessperson-24711

boundless-pizza-95864

03/23/2022, 8:11 PM

This looks like a good candidate for the Q&A section to me.

famous-businessperson-24711

03/23/2022, 10:53 PM

thanks for this explanation everything! however one point of confusion for me is that it seems contradictory in the subworkflows aren't compiled into the dag and act like "barriers" to further execution https://flyte-org.slack.com/archives/CREL4QVAQ/p1646854441257309

famous-businessperson-24711

03/23/2022, 10:53 PM

are both true?

hallowed-mouse-14616

03/23/2022, 11:01 PM

@famous-businessperson-24711 good point. Maybe it would be good to clarify how subworkflows are included in the DAG. There is a single node entrypoint called the SubworkflowNode, it kinds of acts as a start point for the subworkflow execution. As you observed this node requires all of the inputs for the subworkflow to be available to enter into the subworkflow execution. As you pointed out, there is certainly an optimization that we could remove this node and just connect the subworkflow into the DAG based solely on input dependencies. I think this would be awesome. Off the top of my head I'm not sure of the effort needed to implement this, however I'm sure it's a non-negligible amount of work. Is it something you (or your colleagues) might be interested in helping out with?

famous-businessperson-24711

03/23/2022, 11:06 PM

right now, we only have capacity to help if the scope is on the level of a weekend hack (i might be interested). otherwise it's not in our critical path but maybe be in the future when performance becomes a chief concern we can we can consider it for sure

hallowed-mouse-14616

03/23/2022, 11:12 PM

Oh sure, if you want to create an issue we can certainly track this. Like you said, it's not a critical path item, but if it has a lot of interest I'm sure we could carve out a spot in the roadmap for it. Otherwise, I'd be happy to help anyone who is interested in hacking at it.

freezing-airport-6809

03/23/2022, 11:13 PM

I definitely think, this is a perf improve that we can perform in the future

freezing-airport-6809

03/23/2022, 11:14 PM

TBH, the simplified execution model today makes it easier to follow for users. Its like invoking a function

freezing-airport-6809

03/23/2022, 11:14 PM

you cannot invoke a function, with partial inputs right?

freezing-airport-6809

03/23/2022, 11:15 PM

if you look at the programming model here, it seems like invoking a function. But, you are right, we can absolutely optimize this as a post compilation pass (optimization pass), which we intend to in the future

famous-businessperson-24711

03/23/2022, 11:17 PM

yea but you'd also expect a function to represented in the call stack - as it were 😉

famous-businessperson-24711

03/23/2022, 11:17 PM

to borrow the analogy and return to the original question

famous-businessperson-24711

03/23/2022, 11:25 PM

in any case, thanks for the explanation! if we're ranking priorities i think we'd vote for lps/workflows behaving more similarly given their semantically similarity in user code (at least in flytekit). ie it's a bit difficult to understand whether you're invoking a subworkflow or an lp (i thought workflows were wrapped in default lp's?) but there are some important implications in the system. or perhaps having them be semantically different in the api

👍 1

famous-businessperson-24711

03/23/2022, 11:26 PM

i realize this is also probably a non trivial change

famous-businessperson-24711

03/23/2022, 11:27 PM

also sorry we only come to you when we have problems 😄 flyte is a great product!

😆 1

192 Views

Open in Slack

Previous Next