Hi All, I’m trying to run a large workflow on our ...
# announcements
n
Hi All, I’m trying to run a large workflow on our GKE cluster with many thousands of parallel tasks. Is there a recommended Flyte config to make sure Flyte is maximizing our compute resources? For example helm chart settings that might allow more tasks to be scheduled in parallel More information about my current situation in thread
Currently our GCP account has the ability to support about 2500-3000 tasks in parallel based on the CPU/MEM requirements of these Flyte tasks Initially the workflow started and FlytePropeller scheduled and ran about 500 tasks in parallel. I figured it would ramp up over time as new nodes were provisioned However after about 30 minutes the concurrent pods dropped to around 150 tasks and it’s been that way ever since (over 60 minutes) We have about 40 nodes running currently and they are only using about 25% of CPU/Mem, so not sure why FlytePropeller isn’t scheduling more tasks I also checked the CPU/Mem quota in the namespace and it’s well under the limit as well
cc: @Justin Tyberg
k
Cc @Dan Rammer (hamersaw) / @Haytham Abuelfutuh
Having a diagram of the fan out will be interesting to see
n
How can I share that with you?
the Flyte Console graph for the workflow is not very interesting
k
Is it like just a fan out of 2500/5000?
If so why not use map task?
n
it’s 100 input files each which generates 1280 tasks so 128,000 total tasks
k
Ohh, in a fan out on one graph?
I think there are a few things over here- max parallelism, number of workers, k8s config etc
n
it’s in one workflow execution currently so i assume one graph?
I can launch as one workflow execution per file if that would help, but I’d like to understand the config options available to our deployment like you said max parallelism, etc.
Currently we only have 1 FlytePropeller pod that has 1CPU and 2GB mem it doesn’t seem to be maxing out on either. Propeller also has 50 workers configured
k
Hey Nicolas do you have a couple minutes to get on Avicii
A VC
One draft running 120 8000 tasks I don’t think it is supported
n
Hi Ketan yes I can meet up and happy to reconfigure to stay within limits it’s a very parallel, and I’m probably not setting everything up optimially
k
ok let me dm you. ideally you should just use launch-plans
👍 1
n
@Ketan (kumare3) @Dan Rammer (hamersaw) splitting to a separate launchplan per input file worked. All of our available 3000 vCPUs are churning! Thanks for your help!
k
😄
❤️
105 Views