hi team, any pointers or guidelines that will help...
# flyte-support
s
hi team, any pointers or guidelines that will help us scaling flyte propeller ? attaching the metrics below. The issue is not all workflows are getting picked by workers, more than 60-65% workers are available, while the workflow acceptance and node transition latencies are on higher side. the flyte-propeller has been scaled to 3 shards and resource utilisation is low. below is the propeller config as well. Thank you !
Copy code
core:
    propeller:
      rawoutput-prefix: will-be-replaced
      workers: 60
      gc-interval: 2h
      max-workflow-retries: 50
      workflow-reeval-duration: 7s        
      downstream-eval-duration: 3s
      max-streak-length: 10
      kube-client-config:
        qps: 200
        burst: 50
        timeout: 30s
      queue:
        type: batch
        batching-interval: 2s
        batch-size: -1
        queue:
          type: maxof
          rate: 200
          capacity: 2000
          base-delay: 5s
          max-delay: 120s
        sub-queue:
          type: bucket
          rate: 100
          capacity: 1000
      workflowStore:
        policy: ResourceVersionCache
      storage: 
        cache: 
          max_size_mbs: 1024
          target_gc_percent: 60
f
Not sure how we can help. Something seems to be wrong. May I recommend using Flyte support by union as we will need to take a deeper look
s
cc @glamorous-rainbow-77959
g
@freezing-airport-6809 not asking for a complete solution, maybe just a direction we can dig towards.
And we will definitely evaluate support services if you have them for custom Flyte deployments, maybe you or someone from Union team can DM me the details about it
c
The key metric for us is to look at the unprocessed queue depth and the worker count. From my experience (ie. last week) we've seen the unprocessed queue depth increase while workers are available when FlytePropeller was getting CPU throttled. It only seems to happen under high load.
Screenshot 2025-04-04 at 10.00.45 AM.png,Screenshot 2025-04-04 at 10.00.08 AM.png
s
Thank you @clean-glass-36808 for the input, let me check in this direction
f
Ohh do you have very few CPUs allocated ?
g
@freezing-airport-6809 You mean allocated to flyte propeller. No, I think we have a decent amount. @square-carpet-13590, could you clarify?
s
@freezing-airport-6809 we have set request-1cpu & limit-2cpu per pod, there is some amount of throttling but now alot tho
f
1cpu is small - but I don’t like 1-2, as it will cause throttling as Jason said
s
got it, will try with more cpus. Thank you
f
Keep it 2
But that’s not the problem
If things are not getting picked up it has to be something else
s
yes, could be, may be the kube-client-config , not sure the ones we set above is sufficient
due to less throttling i eliminated cpu as the issue
f
200
That should be ok depending on the load
But also you have 3 propellers
So that’s 600 ps
s
okay