Hello we ve been experimenting with using the ArrayNode `map Flyte #flyte-support

Hello - we've been experimenting with using the Ar...

boundless-policeman-47000

08/01/2024, 8:49 PM

Hello - we've been experimenting with using the ArrayNode

map_task

in 1.13 for very large jobs (>60,000 tasks) and have found an interesting failure case. It seems like once the job size crosses a certain threshold, we consistently see failures on the same

map_task

batch. The error is a generic:

Copy code

[324-325][328-329][332-333][338-339]: [1/1] currentAttempt done. Last Error: UNKNOWN::Outputs not generated by task execution

The relevant part of the execution graph is attached in a thread. Each

_run_experiments_batch

runs 3 sets of

map_task

, where each

map_task

runs 400 tasks.

boundless-policeman-47000

08/01/2024, 8:53 PM

Screenshot from 2024-08-01 16-51-22.png

boundless-policeman-47000

08/01/2024, 8:55 PM

What's interesting is that whether we run a job of this size or double, in both cases the failure happens at node

dn9

boundless-policeman-47000

08/01/2024, 8:56 PM

In each case, it's always the 3rd map_task that fails on this node with the above error. When I attempt to view the logs in cloudwatch, I see the logs of the 1st (of 3)

map_task

on that node instead of the 3rd.

boundless-policeman-47000

08/01/2024, 8:56 PM

Recovering the job appears to have it complete successfully

flat-area-42876

08/02/2024, 12:53 AM

I think we only ran scale testing for ArrayNode up to ~5000 subtasks. And just to confirm, are you running an ArrayNode with 60,000 subtasks? I figured we'd run into etcd size limitations with that many, but that should lead to a different error. Had you run workloads of similar scale on the legacy maptasks? Also, what concurrency were you running the map task with? What flyte version are you running for propeller and what flytekit version as well? We should probably enrich the error message to include the file path. I'm unsure how/if the output prefix/file path could differ when the subtask pod is built and when creating the output writer after the task succeeds.

freezing-airport-6809

08/02/2024, 1:53 AM

5k is the limit if you want 60, you will have to run 12 copies of them. The state explodes and etcd cannot handle it

flat-area-42876

08/02/2024, 2:01 AM

yup, and by breaking into 12 copies, it'd be executing over 12 different launchplans. However, I'm unsure if the workflow is actually running 60k tasks. From my understanding, if the workflow state gets too large for etcd to handle, there should be an error at the controller level to fail the workflow with "Workflow execution state is too large for Flyte to handle" and then failing any running tasks

boundless-policeman-47000

08/02/2024, 3:52 AM

Each

map_task

runs 400 tasks and each one of those dynamics in the screencap runs 3 of those

map_tasks

in sequence, so roughly 1200 per dynamic. I then batch up the dynamics so that no more than 5 are running concurrently

boundless-policeman-47000

08/02/2024, 3:56 AM

The 60,000+ number is from distributing the work in batches across those dynamic workflows

boundless-policeman-47000

08/02/2024, 3:59 AM

So to recap - each ArrayNode individually does not run more than 400, each dynamic workflow should run roughly 1200, but each Flyte execution should be running on the order of ~60,000 total with roughly no more than 2000 individual tasks running concurrently

boundless-policeman-47000

08/02/2024, 4:11 AM

@flat-area-42876 we just recently upgraded to flytekit and propeller 1.13 from 1.10.7. Before, we were using

FlyteRemote

to distribute our workload across separate pipeline executions because we couldn't hit nearly this many within one execution. I'm testing to see if the (noticeably improved!) scalability in 1.13 will allow us to move our entire workset into a single execution

boundless-policeman-47000

08/02/2024, 4:16 AM

I bumped the resource requests on the dynamic workflows (does this have an impact?) and a task we use to materialize results and was able to complete 2 runs of ~60,000 total tasks, testing now with a very large run of ~150,000 tasks (same structure as above, but bumped the ArrayNodes from 400 to 500). If I want to complete very large runs like this from a single execution, should I expect to need to bump the ArrayNodes up into the thousands? One of our task types has variable duration, so I've been aiming to keep the ArrayNode's fairly constrained so that a small number of long-running tasks don't block progress

boundless-policeman-47000

08/02/2024, 4:56 PM

That very large run failed - what I'm looking for is to understand if there is an upper bound on the total number of nodes in an execution or a single dynamic

flat-area-42876

08/02/2024, 6:01 PM

Upper bound should be around 5k nodes in the execution before you start running into etcd issues. You could increase the scale of a single execution by kicking off external workflows from a parent workflow

boundless-policeman-47000

08/02/2024, 7:54 PM

An entire

map_task

is a single ArrayNode right? In my case I'm running into issues with 50 dynamic workflows that each have 3

map_task

and 2 tasks to materialize results, so ~250 nodes

flat-area-42876

08/02/2024, 8:11 PM

Yup it's a single node. I'm skeptical the

Last Error: UNKNOWN::Outputs not generated by task execution

is tied to etcd limitations. This error occurs when the task is successful but then propeller's storage client gets a "Not found" when trying to read the output file. Can you check if the output file exists in your blob store?

flat-area-42876

08/02/2024, 8:32 PM

Are you using the default stow_store DataStore client as well?

61 Views

Open in Slack

Previous Next