is there any way to set cluster level config to limit the <h Flyte #flyte-support

is there any way to set cluster-level config to li...

helpful-book-73672

08/09/2023, 7:58 PM

is there any way to set cluster-level config to limit the concurrency of map tasks? (I don't believe so, but wanted to make sure) for context, we're looking at increasing

maxArrayJobSize

to 30k to support some very large jobs. we're worried about the case where users submit large workflows without setting a concurrency value, which implies unbounded concurrency. this could impact other workloads that share the k8s cluster (ex. kube-scheduler might struggle scheduling so many pods all at once?)

tall-lock-23197

08/10/2023, 9:11 AM

I don't think the concurrency of map tasks can be set at the cluster-level config. @hallowed-mouse-14616, could you please confirm?

👍 1

hallowed-mouse-14616

08/10/2023, 2:39 PM

@helpful-book-73672, Samhita is correct. There currenlty is no global configuration for the number of concurrent Pods or default configuration for maptask concurrency. This is something we have discussed updating. The new

ArrayNode

implementation (releasing in v1.9) is an overhaul of maptasks to fix a number of issues (RFC here). It's being released as an experimental feature, so might be a few small bugs to iron out. With respect to concurrency levels, we think it makes sense to have maptasks using ArrayNodes default to using the same concurrency as the parent workflow. For example, if the workflow has

max-parallelism

of 30, then the maptask would inherit this in tandem with other running tasks. IMO it is the intuitive approach, but we would also allow for manually setting other concurrency levels (current approach). I'm interested in hearing your thoughts on this?

🙌 1

helpful-book-73672

08/10/2023, 5:39 PM

thanks Samhita and Dan for confirming. yes, that seems intuitive to me too with this new ArrayNode implementation. with that approach, I assume the platform-level config

maxParallelism

would be passed down as a default to ArrayNodes as well?

full-ram-17934

08/10/2023, 5:56 PM

For example, if the workflow has
max-parallelism
of 30, then the maptask would inherit this in tandem with other running tasks.

but we would also allow for manually setting other concurrency levels

I think the above all make sense, but that it would still be very desirable to be able to enforce some global limits for sanity on the backend -- both for the propeller-level maxParallelism and the map-task-level concurrency. However, agreed that the 80/20 is served by just not having the default concurrency be unbounded!

full-ram-17934

08/10/2023, 6:02 PM

Looking at the RFC though, and reading what you said here again:

For example, if the workflow has
max-parallelism
of 30, then the maptask would inherit this in tandem with other running tasks.

Does this imply that in the default case the max-parallelism limit would be enforced on map tasks, inclusive of the other tasks that may be running for a workflow? I.e., not an independent limit per map task but more of a workflow global limit? That would be pretty rad if a slight departure from current semantics.

hallowed-mouse-14616

08/10/2023, 7:43 PM

So there is a resource manager component in flytepropeller. TBH it's not something I have ever touched, but maybe there's somebody at union that knows more about it than I do. The idea is a token allocation system for global resource limits. IIUC this could be used to restrict the number of concurrent tasks across workflows. It wouldn't work with the current maptask because of implementation limitations, but it should work with ArrayNode. Maybe it would be something to look into, but like I said - not sure how strongly maintained and battle tested it is.

hallowed-mouse-14616

08/10/2023, 7:45 PM

Does this imply that in the default case the max-parallelism limit would be enforced on map tasks, inclusive of the other tasks that may be running for a workflow?

Yes, this is exactly correct. With

ArrayNode

, along with the multitude of other fixes, this is my dream for the default. I made an issue for this a few days ago, hopefully can get to hacking it together!

💯 1

9 Views

Open in Slack

Previous Next