Victor Delépine
08/21/2023, 2:26 PMFranco Bocci
08/21/2023, 2:55 PMVictor Churikov
08/21/2023, 2:58 PMThe worst case for FlytePropeller is workflows that have an extremely large fan-out. This is because FlytePropeller implements a greedy traversal algorithm, that tries to evaluate the entire unblocked nodes within a workflow in every round. A solution for this is to limit the maximum number of nodes that can be evaluated. This can be done by setting max-parallelism for an execution. This can done in multiple waysIt seems to be a setting for limiting the amount of nodes that can run concurrently within an execution
Franco Bocci
08/21/2023, 2:59 PMmax_parallelism
would applyVictor Churikov
08/21/2023, 3:01 PMThe worst case for FlytePropeller is workflows that have an extremely large fan-out.I’ve also had issues with Flyte console being unable to display large execution graphs sometimes (https://github.com/flyteorg/flyte/issues/3803)
Franco Bocci
08/21/2023, 3:06 PMVictor Churikov
08/21/2023, 3:09 PMqueue:
batch-size: -1
batching-interval: 2s
queue:
base-delay: 5s
capacity: 1000
max-delay: 120s
rate: 100
type: maxof
sub-queue:
capacity: 1000
rate: 100
type: bucket
So you should be able to configure a higher capacity for the queue to keep your 100,000 executions in the queue
But this does not limit the amount of executions that can run at the same timeVictor Delépine
08/21/2023, 3:30 PMFranco Bocci
08/21/2023, 3:39 PMVictor Churikov
08/21/2023, 3:44 PM• No more than 110 pods per node
• No more than 5,000 nodes
• No more than 150,000 total pods
• No more than 300,000 total containersSo this should probably run on a multi-cluster setup: https://docs.flyte.org/en/latest/deployment/deployment/multicluster.html
Victor Delépine
08/21/2023, 3:48 PMKetan (kumare3)