Hello we been running two instance of the flyte propeller bu Flyte #flyte-support

Hello, we been running two instance of the flyte p...

broad-train-34581

10/10/2023, 9:32 AM

Hello, we been running two instance of the flyte propeller but we noticed a disparity in the memory consumption between the two instances. Flyte 1.9.1 is installed on GKE via helm. Checking past 2 months, it seems like one of the propeller is underutilise or not utilised at all since it doesn't go up with load/time. The earlier metrics shows Flyte on 1.8.1 (before Sep 23) Some questions • is there something wrong that cause this uneven distribution? • it seem that the memory utilisation increases with time, are there any internal caching by the propeller that cause this behaviour? • are there autoscaling available? any other advice would be appreciated, thanks!

Copy code

k top pod
NAME                                  CPU(cores)   MEMORY(bytes)
flytepropeller-ddb88df5-4wltg         6m           758Mi
flytepropeller-ddb88df5-j8z4c         2m           67Mi

freezing-airport-6809

10/10/2023, 1:15 PM

Did you run propeller manager

freezing-airport-6809

10/10/2023, 1:15 PM

Propeller is leader elected and you cannot simply run multiple copies.

freezing-airport-6809

10/10/2023, 1:16 PM

If you want to run multiple copies that is through sharding - which is all auto manager by propeller manager

freezing-airport-6809

10/10/2023, 1:16 PM

Above graph shows single propeller mode

freezing-airport-6809

10/10/2023, 1:17 PM

Also you should tweak some configs. Single propeller can handle 1000s of workflows per second

freezing-airport-6809

10/10/2023, 1:18 PM

https://docs.flyte.org/en/latest/deployment/configuration/performance.html#scaling-up-flytepropeller

broad-train-34581

10/11/2023, 10:11 AM

Did you run propeller manager

no, didnt know we have to 😅 , assumed it will be load balanced once we have replica > 1 like other components i'll test out the sharding, thanks for referring to the docs.

Also you should tweak some configs. Single propeller can handle 1000s of workflows per second

i checked the metrics but they don't show any anomaly

Copy code

flyte:propeller:all:free_workers_count
sum(rate(flyte:propeller:all:round:raw_ms[5m])) by (wf)
sum(rate(flyte:propeller:all:main_depth[5m]))

We probably have at most 20 workflows running at any time for now on a 1Gi mem propeller. We'll just increase the mem and monitor further. Thanks!

broad-train-34581

10/11/2023, 11:08 AM

• it seem that the memory utilisation increases with time, are there any internal caching by the propeller that cause this behaviour?

with regards to this, i reduced the number of replica from 2 to 1, since the other propeller isn't in used anyway So the resource that the single replica is doubled, but the memory consumption drop drastically. The number of running workflow after deployment was even higher

freezing-airport-6809

10/11/2023, 12:16 PM

Yes propeller maintains a lookaside style cache for many data items it works with

freezing-airport-6809

10/11/2023, 12:16 PM

It will increase greedily and then maintain

freezing-airport-6809

10/11/2023, 12:17 PM

If memory is what you were sharing I am not worried just add I would say 4GB and change cache in storage config to 2G or more and let it run with more workers like 400

freezing-airport-6809

10/11/2023, 12:18 PM

Propellers will be fine

freezing-airport-6809

10/11/2023, 12:18 PM

This is not airflow - it will scale really well

broad-train-34581

10/13/2023, 4:04 AM

Got it. Are there any key benefit to running multiple propeller as there are more synchronisation involved? Otherwise we can stick to single propeller (like we did all these while)

freezing-airport-6809

10/13/2023, 1:41 PM

Benefit is scale in lots of concurrent executions

👍 1

5 Views

Open in Slack

Previous Next