Thread
#flytekit
    Varun Kulkarni

    Varun Kulkarni

    6 months ago
    👋 I’m hoping to get a better understanding of when a workflow is considered “executed”. I have a series of workflows that are structured as follows: Workflow A: A prerequisite workflow that does some initial data processing • Workflow B: A downstream consumer of WF A, that invokes WF A as a subworkflow in execution • Workflow C: A downstream consumer of WF A, that invokes WF A as a subworkflow in execution - has no relationship to Workflow B other than the shared dependency to A I launched Workflows B and C sequentially, and as expected: • A was invoked from B, and the Flyte console indicates that A’s output was written to cache • A was then invoked from C, and the Flyte console indicates that A’s output was read from cache However, when I look at the Flyte console for WF A, it indicates that no executions can be found. That leads me to believe that a workflow is only considered executed if it is explicitly launched by a user, and that sub-workflow executions are not assigned their own execution IDs. Is my understanding correct?
    Dan Rammer (hamersaw)

    Dan Rammer (hamersaw)

    6 months ago
    Hi @Varun Kulkarni, great question! The statement "sub-workflow executions are not assigned their own execution IDs" is correct. This is actual rather complex, so I'll try to explain, but please let me know if anything doesn't make sense.
    When you use a subworkflow as part of another workflow (as in your example of Workflow B and C using Workflow A) Flyte compiles that subworkflow into the execution DAG. This execution can then be though of as "inline" in a sense, where the separate execution (of Workflow A) will not be tracked but all of the tasks are executed according to the dependencies.
    If you are looking to track a separate execution I believe that launchplans allow you to do this. The idea of a launch plan is to start a separate execution of a workflow (in your example Workflow A) and then during execution of the parent workflow (in your example Workflows B and C) they track the external execution.
    @Ketan (kumare3) @Haytham Abuelfutuh Can you confirm that launchplans allow subworkflow executions to be tracked in the UI? ^^^
    Varun Kulkarni

    Varun Kulkarni

    6 months ago
    I see, thanks for the quick response! As a follow-up, does this also mean that sub-workflows would also not emit any
    WorkflowExecutionEvents
    ? Put differently, if we invoke A as a sub-workflow from B, can we expect to see
    WorkflowExecutionEvents
    emitted for workflows A and B with the same execution ID? Or would Flyte only emit
    NodeExecutionEvents
    for all of the tasks that define A over the course of executing B?
    Dan Rammer (hamersaw)

    Dan Rammer (hamersaw)

    6 months ago
    Correct, subworkflows will not emit specific
    WorkflowExecutionEvents
    rather flyte will only emit events for the parent workflow, which will include
    NodeExecutionEvents
    . So in your example, Workflow B and C will emit
    NodeExecutionEvents
    for all of the tasks in Workflow A with the execution IDs of the top level execution (ie. B and C) respectively.
    Again, the use of launchplans means that Workflow A will be executed as a separate execution and therefore will emit it's own
    WorkflowExecutionEvents
    if that is the desired behavior.
    Ketan (kumare3)

    Ketan (kumare3)

    6 months ago
    this is correct. Thank you for the great answer Dan
    Varun Kulkarni

    Varun Kulkarni

    6 months ago
    Yes, that is the desired behavior for our application - I’ll try to invoke A with a launchplan and let you know whether we can observe a dedicated
    WorkflowExecutionEvent
    / unique execution ID for A. Really appreciate the help here, thanks!
    Ketan (kumare3)

    Ketan (kumare3)

    6 months ago
    I am sure you can
    as you get a new execution ID
    Varun Kulkarni

    Varun Kulkarni

    6 months ago
    ok great!
    CC @Sohan Shah @Dylan Wilder
    s

    Sören Brunk

    6 months ago
    This looks like a good candidate for the Q&A section to me.
    Dylan Wilder

    Dylan Wilder

    6 months ago
    thanks for this explanation everything! however one point of confusion for me is that it seems contradictory in the subworkflows aren't compiled into the dag and act like "barriers" to further executionhttps://flyte-org.slack.com/archives/CREL4QVAQ/p1646854441257309
    are both true?
    Dan Rammer (hamersaw)

    Dan Rammer (hamersaw)

    6 months ago
    @Dylan Wilder good point. Maybe it would be good to clarify how subworkflows are included in the DAG. There is a single node entrypoint called the SubworkflowNode, it kinds of acts as a start point for the subworkflow execution. As you observed this node requires all of the inputs for the subworkflow to be available to enter into the subworkflow execution. As you pointed out, there is certainly an optimization that we could remove this node and just connect the subworkflow into the DAG based solely on input dependencies. I think this would be awesome. Off the top of my head I'm not sure of the effort needed to implement this, however I'm sure it's a non-negligible amount of work. Is it something you (or your colleagues) might be interested in helping out with?
    Dylan Wilder

    Dylan Wilder

    6 months ago
    right now, we only have capacity to help if the scope is on the level of a weekend hack (i might be interested). otherwise it's not in our critical path but maybe be in the future when performance becomes a chief concern we can we can consider it for sure
    Dan Rammer (hamersaw)

    Dan Rammer (hamersaw)

    6 months ago
    Oh sure, if you want to create an issue we can certainly track this. Like you said, it's not a critical path item, but if it has a lot of interest I'm sure we could carve out a spot in the roadmap for it. Otherwise, I'd be happy to help anyone who is interested in hacking at it.
    Ketan (kumare3)

    Ketan (kumare3)

    6 months ago
    I definitely think, this is a perf improve that we can perform in the future
    TBH, the simplified execution model today makes it easier to follow for users. Its like invoking a function
    you cannot invoke a function, with partial inputs right?
    if you look at the programming model here, it seems like invoking a function. But, you are right, we can absolutely optimize this as a post compilation pass (optimization pass), which we intend to in the future
    Dylan Wilder

    Dylan Wilder

    6 months ago
    yea but you'd also expect a function to represented in the call stack - as it were 😉
    to borrow the analogy and return to the original question
    in any case, thanks for the explanation! if we're ranking priorities i think we'd vote for lps/workflows behaving more similarly given their semantically similarity in user code (at least in flytekit). ie it's a bit difficult to understand whether you're invoking a subworkflow or an lp (i thought workflows were wrapped in default lp's?) but there are some important implications in the system. or perhaps having them be semantically different in the api
    i realize this is also probably a non trivial change
    also sorry we only come to you when we have problems 😄 flyte is a great product!