Flyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.

Flyte

Currently our GCP account has the ability to support about 2500-3000 tasks in parallel based on the CPU/MEM requirements of these Flyte tasks
Initially the workflow started and FlytePropeller scheduled and ran about 500 tasks in parallel.
I figured it would ramp up over time as new nodes were provisioned
However after about 30 minutes the concurrent pods dropped to around 150 tasks and it’s been that way ever since (over 60 minutes)
We have about 40 nodes running currently and they are only using about 25% of CPU/Mem, so not sure why FlytePropeller isn’t scheduling more tasks
I also checked the CPU/Mem quota in the namespace and it’s well under the limit as well