helpful-book-73672
08/09/2023, 7:58 PMmaxArrayJobSize
to 30k to support some very large jobs. we're worried about the case where users submit large workflows without setting a concurrency value, which implies unbounded concurrency. this could impact other workloads that share the k8s cluster (ex. kube-scheduler might struggle scheduling so many pods all at once?)tall-lock-23197
hallowed-mouse-14616
08/10/2023, 2:39 PMArrayNode
implementation (releasing in v1.9) is an overhaul of maptasks to fix a number of issues (RFC here). It's being released as an experimental feature, so might be a few small bugs to iron out. With respect to concurrency levels, we think it makes sense to have maptasks using ArrayNodes default to using the same concurrency as the parent workflow. For example, if the workflow has max-parallelism
of 30, then the maptask would inherit this in tandem with other running tasks. IMO it is the intuitive approach, but we would also allow for manually setting other concurrency levels (current approach). I'm interested in hearing your thoughts on this?helpful-book-73672
08/10/2023, 5:39 PMmaxParallelism
would be passed down as a default to ArrayNodes as well?full-ram-17934
08/10/2023, 5:56 PMFor example, if the workflow hasof 30, then the maptask would inherit this in tandem with other running tasks.max-parallelism
but we would also allow for manually setting other concurrency levelsI think the above all make sense, but that it would still be very desirable to be able to enforce some global limits for sanity on the backend -- both for the propeller-level maxParallelism and the map-task-level concurrency. However, agreed that the 80/20 is served by just not having the default concurrency be unbounded!
full-ram-17934
08/10/2023, 6:02 PMFor example, if the workflow hasDoes this imply that in the default case the max-parallelism limit would be enforced on map tasks, inclusive of the other tasks that may be running for a workflow? I.e., not an independent limit per map task but more of a workflow global limit? That would be pretty rad if a slight departure from current semantics.of 30, then the maptask would inherit this in tandem with other running tasks.max-parallelism
hallowed-mouse-14616
08/10/2023, 7:43 PMhallowed-mouse-14616
08/10/2023, 7:45 PMDoes this imply that in the default case the max-parallelism limit would be enforced on map tasks, inclusive of the other tasks that may be running for a workflow?Yes, this is exactly correct. With
ArrayNode
, along with the multitude of other fixes, this is my dream for the default. I made an issue for this a few days ago, hopefully can get to hacking it together!