:wave: I’m hoping to get a better understanding of...
# flytekit
v
👋 I’m hoping to get a better understanding of when a workflow is considered “executed”. I have a series of workflows that are structured as follows: • Workflow A: A prerequisite workflow that does some initial data processing • Workflow B: A downstream consumer of WF A, that invokes WF A as a subworkflow in execution • Workflow C: A downstream consumer of WF A, that invokes WF A as a subworkflow in execution - has no relationship to Workflow B other than the shared dependency to A I launched Workflows B and C sequentially, and as expected: • A was invoked from B, and the Flyte console indicates that A’s output was written to cache • A was then invoked from C, and the Flyte console indicates that A’s output was read from cache However, when I look at the Flyte console for WF A, it indicates that no executions can be found. That leads me to believe that a workflow is only considered executed if it is explicitly launched by a user, and that sub-workflow executions are not assigned their own execution IDs. Is my understanding correct?
d
Hi @Varun Kulkarni, great question! The statement "sub-workflow executions are not assigned their own execution IDs" is correct. This is actual rather complex, so I'll try to explain, but please let me know if anything doesn't make sense.
When you use a subworkflow as part of another workflow (as in your example of Workflow B and C using Workflow A) Flyte compiles that subworkflow into the execution DAG. This execution can then be though of as "inline" in a sense, where the separate execution (of Workflow A) will not be tracked but all of the tasks are executed according to the dependencies.
If you are looking to track a separate execution I believe that launchplans allow you to do this. The idea of a launch plan is to start a separate execution of a workflow (in your example Workflow A) and then during execution of the parent workflow (in your example Workflows B and C) they track the external execution.
@Ketan (kumare3) @Haytham Abuelfutuh Can you confirm that launchplans allow subworkflow executions to be tracked in the UI? ^^^
v
I see, thanks for the quick response! As a follow-up, does this also mean that sub-workflows would also not emit any
WorkflowExecutionEvents
? Put differently, if we invoke A as a sub-workflow from B, can we expect to see
WorkflowExecutionEvents
emitted for workflows A and B with the same execution ID? Or would Flyte only emit
NodeExecutionEvents
for all of the tasks that define A over the course of executing B?
d
Correct, subworkflows will not emit specific
WorkflowExecutionEvents
rather flyte will only emit events for the parent workflow, which will include
NodeExecutionEvents
. So in your example, Workflow B and C will emit
NodeExecutionEvents
for all of the tasks in Workflow A with the execution IDs of the top level execution (ie. B and C) respectively.
👍 2
Again, the use of launchplans means that Workflow A will be executed as a separate execution and therefore will emit it's own
WorkflowExecutionEvents
if that is the desired behavior.
👍 2
k
this is correct. Thank you for the great answer Dan
👏 1
v
Yes, that is the desired behavior for our application - I’ll try to invoke A with a launchplan and let you know whether we can observe a dedicated
WorkflowExecutionEvent
/ unique execution ID for A. Really appreciate the help here, thanks!
🙌 1
k
I am sure you can
as you get a new execution ID
v
ok great!
CC @Sohan Shah @Dylan Wilder
s
This looks like a good candidate for the Q&A section to me.
d
thanks for this explanation everything! however one point of confusion for me is that it seems contradictory in the subworkflows aren't compiled into the dag and act like "barriers" to further execution https://flyte-org.slack.com/archives/CREL4QVAQ/p1646854441257309
are both true?
d
@Dylan Wilder good point. Maybe it would be good to clarify how subworkflows are included in the DAG. There is a single node entrypoint called the SubworkflowNode, it kinds of acts as a start point for the subworkflow execution. As you observed this node requires all of the inputs for the subworkflow to be available to enter into the subworkflow execution. As you pointed out, there is certainly an optimization that we could remove this node and just connect the subworkflow into the DAG based solely on input dependencies. I think this would be awesome. Off the top of my head I'm not sure of the effort needed to implement this, however I'm sure it's a non-negligible amount of work. Is it something you (or your colleagues) might be interested in helping out with?
d
right now, we only have capacity to help if the scope is on the level of a weekend hack (i might be interested). otherwise it's not in our critical path but maybe be in the future when performance becomes a chief concern we can we can consider it for sure
d
Oh sure, if you want to create an issue we can certainly track this. Like you said, it's not a critical path item, but if it has a lot of interest I'm sure we could carve out a spot in the roadmap for it. Otherwise, I'd be happy to help anyone who is interested in hacking at it.
k
I definitely think, this is a perf improve that we can perform in the future
TBH, the simplified execution model today makes it easier to follow for users. Its like invoking a function
you cannot invoke a function, with partial inputs right?
if you look at the programming model here, it seems like invoking a function. But, you are right, we can absolutely optimize this as a post compilation pass (optimization pass), which we intend to in the future
d
yea but you'd also expect a function to represented in the call stack - as it were 😉
to borrow the analogy and return to the original question
in any case, thanks for the explanation! if we're ranking priorities i think we'd vote for lps/workflows behaving more similarly given their semantically similarity in user code (at least in flytekit). ie it's a bit difficult to understand whether you're invoking a subworkflow or an lp (i thought workflows were wrapped in default lp's?) but there are some important implications in the system. or perhaps having them be semantically different in the api
👍 1
i realize this is also probably a non trivial change
also sorry we only come to you when we have problems 😄 flyte is a great product!
😆 1
169 Views