Hey all I am looking for a way to get an execution s tasks a Flyte #flyte-support

Hey, all! I am looking for a way to get an executi...

bored-beard-89967

07/11/2023, 6:09 PM

Hey, all! I am looking for a way to get an execution’s tasks and corresponding durations using an existing python API. Is there a way to accomplish this?

bored-beard-89967

07/11/2023, 6:09 PM

Is there anything in flytekit that might help me here that I am missing?

bored-beard-89967

07/11/2023, 8:02 PM

This is what I have so far:

Copy code

from flytekit.remote import FlyteRemote, FlyteWorkflowExecution, FlyteTaskExecution
from flytekit.configuration import Config

remote = FlyteRemote(
        config=Config.for_endpoint(endpoint=endpoint),
        default_project=project,
        default_domain=domain,
    )

# specific execution
execution = remote.fetch_execution(name="ffc0e12bb3e314fae9aa")
ex = remote.sync_execution(execution=execution, sync_nodes=True)
# ['n0'] is the node I am interested in
node_times = {x.metadata.spec_node_id: x.closure.duration.total_seconds() / 60 \
               for x in ex.node_executions['n0'].executions}

bored-beard-89967

07/11/2023, 8:03 PM

Gives me something like

Copy code

{'end-node': 0.0,
 'n0': 2.698262133333333,
 'n1': 2.202614066666667,
 'n2': 1.59931375,
 'n3': 1.8271749,
 'n4': 30.83562241666667,
 'n5': 0.2830597833333333,
 'n6': 0.3337019666666666,
 'start-node': 0.0}

however, I am really struggling to map the node ids to the task name. Any thoughts?

magnificent-teacher-86590

07/11/2023, 8:24 PM

i think its sequential on how its defined in the dynamic tasks, are there no node metadata associated with node executions?

bored-beard-89967

07/11/2023, 8:25 PM

Copy code

example = ex.node_executions['n0'].executions[2]
example.metadata

gives

Copy code

retry_group: "0"
spec_node_id: "end-node"

bored-beard-89967

07/11/2023, 8:28 PM

and

Copy code

example.id

gives

Copy code

node_id: "n0-0-n1"
execution_id {
  project: "..."
  domain: "..."
  name: "ffc0e12bb3e314fae9aa"
}

magnificent-teacher-86590

07/11/2023, 8:29 PM

how about

task_node_metadata

bored-beard-89967

07/11/2023, 8:30 PM

as an attribute of

example

magnificent-teacher-86590

07/11/2023, 8:31 PM

so i can get the task info from this

Copy code

execution.node_executions.get("n1").closure.task_node_metadata

bored-beard-89967

07/11/2023, 8:35 PM

execution.node_executions.get('n0').closure.task_node_metadata

is None for me.

bored-beard-89967

07/11/2023, 8:35 PM

🤷

bored-beard-89967

07/11/2023, 8:42 PM

I am on `'1.4.2'`of flytekit, what about you @magnificent-teacher-86590?

magnificent-teacher-86590

07/11/2023, 8:44 PM

same version

magnificent-teacher-86590

07/11/2023, 8:44 PM

does your execution succeeded or failed

bored-beard-89967

07/11/2023, 8:45 PM

Succeeds.

bored-beard-89967

07/11/2023, 8:46 PM

There are some subworkflows, which might be causing the task_mode_metadata to be None?

magnificent-teacher-86590

07/11/2023, 8:46 PM

oh its not task but subworkflow?

bored-beard-89967

07/11/2023, 8:47 PM

execution = remote.fetch_execution(name="ffc0e12bb3e314fae9aa")

references a workflow execution where there might be a combination of subworkflows and tasks

bored-beard-89967

07/11/2023, 8:48 PM

The nodes in

Copy code

{'end-node': 0.0,
 'n0': 2.698262133333333,
 'n1': 2.202614066666667,
 'n2': 1.59931375,
 'n3': 1.8271749,
 'n4': 30.83562241666667,
 'n5': 0.2830597833333333,
 'n6': 0.3337019666666666,
 'start-node': 0.0}

can be either a subworkflow or task to add clarity

bored-beard-89967

07/11/2023, 8:48 PM

But that is the depth I would like to achieve.

magnificent-teacher-86590

07/11/2023, 8:49 PM

so in JH, i just print the entire execution and in there all not None attributes are displayed

magnificent-teacher-86590

07/11/2023, 8:49 PM

jupyterhub

bored-beard-89967

07/11/2023, 8:53 PM

Interesting. My case is a bit weird, but my workflow is a single subworkflow containing subworkflows and tasks lol. These subworkflows and tasks are the level at which I would like the duration captured.

bored-beard-89967

07/11/2023, 8:53 PM

When I print the execution, I see the name of the subworkflow only.

magnificent-teacher-86590

07/11/2023, 8:58 PM

yeah without seeing some example or code im all out of ideas haha

bored-beard-89967

07/11/2023, 9:07 PM

No worries! Thanks for jumping in and helping!

bored-beard-89967

07/11/2023, 9:07 PM

I think I might be on to something. I will report back

👍 1

bored-beard-89967

07/11/2023, 9:18 PM

This gets me one step closer, but only includes tasks and not subworkflows:

Copy code

from flytekit.remote import FlyteRemote
from flytekit.models.core.identifier import NodeExecutionIdentifier
from flytekit.clients.helpers import iterate_task_executions
from flytekit.configuration import Config

remote = FlyteRemote(
        config=Config.for_endpoint(endpoint=endpoint),
        default_project=project,
        default_domain=domain,
    )
execution = remote.fetch_execution(name="ffc0e12bb3e314fae9aa")
execution = remote.sync_execution(execution=execution, sync_nodes=True)

profiles = {}
for node_execution in execution.node_executions['n0'].executions:  
    nid = NodeExecutionIdentifier(node_id=node_execution.id.node_id, execution_id=execution.id)
    for t in iterate_task_executions(client=remote.client, node_execution_identifier=nid):
        profiles[t.id.task_id.name] = t.closure.duration

magnificent-teacher-86590

07/11/2023, 9:20 PM

were you able to get the execution id of the subworkflow? last time i tried it was not possible

bored-beard-89967

07/11/2023, 11:10 PM

I think I am managing to do this is this snipped below, granted it is not very elegant or tested:

Copy code

from flytekit.remote import FlyteRemote
from flytekit.models.core.identifier import NodeExecutionIdentifier
from flytekit.clients.helpers import iterate_task_executions
from flytekit.configuration import Config

remote = FlyteRemote(
        config=Config.for_endpoint(endpoint=endpoint),
        default_project=project,
        default_domain=domain,
    )
execution = remote.fetch_execution(name="ffc0e12bb3e314fae9aa")
execution = remote.sync(execution=execution, sync_nodes=True)

profiles = {}
def nested_task_info(node_executions):
    for ne in node_executions.values():
        if ne.id.node_id == "start-node" or ne.id.node_id == "end-node":
            continue
        if ne.metadata.is_parent_node:
            nested_task_info(ne.subworkflow_node_executions)
        else:
            nid = NodeExecutionIdentifier(node_id=ne.id.node_id, execution_id=execution.id)
            for t in iterate_task_executions(client=remote.client, node_execution_identifier=nid):
                profiles[t.id.task_id.name] = t.closure.duration

nested_task_info(execution.node_executions)

🙏 1

magnificent-teacher-86590

07/11/2023, 11:54 PM

awesome, i will give it go on my workflows

freezing-airport-6809

07/12/2023, 6:56 AM

Cc @hallowed-mouse-14616 - there is new api that returns the timeline -

👀 1

bored-beard-89967

07/12/2023, 1:16 PM

@freezing-airport-6809 does this exist today or is this in the works?

freezing-airport-6809

07/12/2023, 9:55 PM

Yes

bored-beard-89967

07/12/2023, 9:56 PM

I couldn’t find anything. If you could share more details that would be appreciated.

hallowed-mouse-14616

07/13/2023, 1:30 AM

@bored-beard-89967 I believe we're referring to the runtime metrics work I recently did. It's used to fuel the timeline breakdown in the UI (attached image). A good example of this API is the

pyflyte metrics explain

command (code here), basically it fetches all of the nested workflow with configurable depth and breaks down the runtime into multiple spans to better attribute time spend during node execution.

😲 2

bored-beard-89967

07/13/2023, 1:11 PM

Awesome!

2 Views

Open in Slack

Previous Next