Hi Flyte community We running Flyte on GKE using the flyte c Flyte #flyte-support

Hi Flyte community! We running Flyte on GKE using ...

worried-airplane-87065

06/20/2025, 6:37 PM

Hi Flyte community! We running Flyte on GKE using the flyte-core deployment. I have a workflow that does (GPU inference + CPU post processing). I want to invoke this workflow on the order of 100-1500 times from a single workflow. Currently we're using

@dynamic

workflows to fanout with

max-parallelism=200

but we're seeing a great deal of latency in workflow progress. Looking at https://www.union.ai/docs/flyte/deployment/flyte-configuration/performance/ and https://www.union.ai/docs/flyte/user-guide/core-concepts/workflows/subworkflows-and-sub-launch-plans/, it looks like we can achieve similar concurrency by invoking a sublaunch plan 100-1500 times and increasing the free worker count for Flytepropeller. We have explored map_tasks but it's a little too restrictive for our use case. Has anyone been similar situations and would be willing to share how they approached the fanout issue. Using sub launchplans for fanout

Copy code

import flytekit as fl


@fl.task
def my_gpu_task() -> None:
    pass

@fl.task
def my_cpu_task -> None:
    pass

@fl.workflow
def my_workflow() -> None:
    my_gpu_task() >> my_cpu_task()

my_workflow_lp = fl.LaunchPlan.get_or_create(my_workflow)


@fl.dynamic
def dynamic_lp(num_fanout: int) -> list[int]:
    return [my_workflow_lp() for i in range(num_fanout)]

average-finland-92144

06/24/2025, 11:51 AM

hey Chris Have you had a chance to identify better the source of latency? Without recurring to map tasks, I think tweaking parameters like workers count may help but it's best to identify the bottleneck. There is the Grafana propeller dashboard that tracks latency and workers

average-finland-92144

06/24/2025, 11:51 AM

also hovering over the timeline view gives you an indication of the phase where the majority of the time is spent on:

average-finland-92144

06/24/2025, 11:52 AM

there at least we could determine if there's a bottleneck in the container bootstrap step or just code execution etc

worried-airplane-87065

06/24/2025, 3:01 PM

Yeah we have the Grafana propeller dashboard setup. During the workflow execution we see "Round traverse latency per workflow" peak to ~4mins. Based on my reading of the docs/code the propeller worker isn't able to poll the state of the workflow tasks fast enough?

worried-airplane-87065

06/24/2025, 3:02 PM

The timeline view unfortunately doesn't load (maybe due to high fanout).

4 Views

Open in Slack

Previous Next