https://flyte.org logo
#ask-the-community
Title
# ask-the-community
o

Olivier Sevin

04/01/2024, 4:02 PM
We have some workflows that seem to be triggering a memory leak in flytepropeller. Within a dynamic task there's a map task sometimes with up to ~5000 subtasks, and even though the subtasks all say succeeded, the map task doesn't for a very long time (sometimes it eventually does). In the meantime memory in flytepropeller keeps rising and even if the map task eventually succeeds or is aborted the memory usage remains. Possibly we're making some connections that aren't there between the memory usage and the workflows, but just wondering if someone has seen anything like that or has any ideas what could be happening?
p

Paul Dittamo

04/01/2024, 4:45 PM
are you using the legacy/old map tasks or the array node map task implementation?
does the memory usage stay elevated after the task and/or workflow succeeds?
o

Olivier Sevin

04/01/2024, 4:57 PM
Legacy map tasks and the memory usage does remain elevated after it succeeds or we abort them (though it stops going up)
will update with heap profiles this afternoon
k

Ketan (kumare3)

04/02/2024, 12:01 AM
@Olivier Sevin Flytepropeller has a big cache - that is greedy and will go up with usage. It gc's at 70% of usage (default) you can adjust this. This is usually not a leak but by design
but 5000 map tasks, with very very large outputs per task may cause propeller to really memory thrash
o

Olivier Sevin

04/02/2024, 12:02 AM
We think we mitigated this by disabling cache in the flyte workflow, Not sure if this is helpful, here's heap diagram (diff_based against 35 minutes earlier while the memory usage was steadily rising)
k

Ketan (kumare3)

04/02/2024, 12:05 AM
close to 5GB in task handler
r

Robert Deaton

04/02/2024, 12:05 AM
The total size of inputs.pb to these map tasks weighs in at ~7MB. Outputs are even smaller, O(100 bytes) per task, just a GCS path
k

Ketan (kumare3)

04/02/2024, 12:05 AM
this is not good, would love to see an example of the workflow
hmm 7MB per task? so 5k * 7MB?
disabling cache does not sound good
r

Robert Deaton

04/02/2024, 12:06 AM
7MB is the total inputs.pb to the map task
I would like to think it doesn't need a separate copy of the inputs for each of the map tasks. We did hypothesize that briefly, but at the very least we ruled out that it's loading that inputs.pb 5k times from object storage (though I suppose the fetch from object storage could be cached and it just unmarshals it once for each task inside the map task)
k

Ketan (kumare3)

04/02/2024, 12:08 AM
it does not need it
but this is an interesting usecase, of how you reached 4GB of usage
r

Robert Deaton

04/02/2024, 12:12 AM
The workflow is really rather simple
Screenshot 2024-04-01 at 5.12.01 PM.png
Screenshot 2024-04-01 at 5.13.19 PM.png
o

Olivier Sevin

04/02/2024, 12:13 AM
Oh just noticed the 70% gc comment, forgot to mention we were actually OOMing, even with 128Gi
r

Robert Deaton

04/02/2024, 12:17 AM
outputs.pb of that
create_map_inputs
run is 7MB, corresponds to 4.7k tasks in the map task
k

Ketan (kumare3)

04/02/2024, 12:31 AM
I think it’s not about simple or complex we will Have to see the actual values
r

Robert Deaton

04/02/2024, 12:32 AM
Mind being specific about what additional things it'd be useful to see here? We're happy to pull together whatever we can
k

Ketan (kumare3)

04/02/2024, 12:46 AM
i would love to see the workflow representative, so that i can reproduce
it might be a leak, but would love to redo it
it seems like the 5k concurrent tasks is what caused it
r

Robert Deaton

04/02/2024, 1:04 AM
I unfortunately cannot share the full workflow, lots of company code in there. Couple options, we're always happy to hop on a call and do some live debugging, or I could try to make a version of the workflow with a stubbed implementations that have the same inputs/outputs to see if we can reproduce that way. The inputs and outputs are generally not themselves sensitive.
We're picking this up this morning and actively continuing debugging today as we have workloads pending. My current plan is to see if I can minimize this test case: use identical inputs and outputs from all the tasks in the workflow, run locally in a sandbox and see if I can reproduce the high memory usage or a memory leak there. Happy to run any other more specific tests if y'all have any hypotheses you'd want to test or specific data to capture
p

Paul Dittamo

04/02/2024, 9:04 PM
@Robert Deaton a repro would be great. Thank you. Looking forward to figuring this one out.
cc: @Dan Rammer (hamersaw)
d

Dan Rammer (hamersaw)

04/08/2024, 7:40 PM
@Olivier Sevin / @Robert Deaton flyte single binary (and each individual component) exposes a golang pprof endpoint on port 10254 by default. Using this we can see exactly where in the binary we're storing tons of data. You can use:
Copy code
wget -O heap.out <http://localhost:10254/debug/pprof/heap>
if the flyte binary is at
localhost:10254
to retrieve a dump of the heap, and then something like:
Copy code
go tool pprof -no_browser -http :8080 heap.out
to start a webserver displaying the results which should show something like the image below. It would be great it we could even get a dump of the heap and would be happy to look through the issues there.
r

Robert Deaton

04/08/2024, 8:46 PM
There’s a diff of the heap growing posted earlier in this thread
@Dan Rammer (hamersaw) we have the raw before/after that we used to generate the diff as well that we can share.
I've not had luck reproducing locally yet, but sometimes work gets busy, making one more pass at reproducing this week.