Hello Could anyone explain the importance of leader election Flyte #flyte-support

Hello. Could anyone explain the importance of lead...

calm-pilot-2010

11/10/2023, 12:52 PM

Hello. Could anyone explain the importance of leader election for flytepropeller? We sometimes see flytepropeller restarting due to failures to renew the lease.

Copy code

E1106 17:54:54.406295       1 leaderelection.go:369] Failed to update lock: Put "<https://10.0.0.1:443/apis/coordination.k8s.io/v1/namespaces/infrastructure--helm--flyte/leases/propeller-leader?timeout=2m0s>": context deadline exceeded
I1106 17:54:54.406322       1 leaderelection.go:285] failed to renew lease infrastructure--helm--flyte/propeller-leader: timed out waiting for the condition
{"json":{},"level":"fatal","msg":"Lost leader state. Shutting down.","ts":"2023-11-06T17:54:54Z"}

Given that we are not currently using a shared flyte-propeller, I thinking we may be able to disable leader election entirely?

✅ 2

hallowed-mouse-14616

11/10/2023, 1:01 PM

Leader election is used to keep a hot duplicate propeller env. If one fails, then k8s leader election allows the second to pick up very quickly. Only one propeller should be active at a time otherwise they will compete with each other and cause issues in creating duplicate Pods for task executions, updating the FlyteWorkflow CRD simultaneously, etc.

calm-pilot-2010

11/10/2023, 1:04 PM

Thanks. It makes sense that multiple concurrent flytepropellers would cause problems. If I have a non-sharded flyte propeller with only one replica and rolling updates disabled though, then I should anyway never have more than one flyte propeller running simultaneously? How would the duplicate propeller env you describe be configured?

hallowed-mouse-14616

11/10/2023, 1:39 PM

then I should anyway never have more than one flyte propeller running simultaneously?

Correct!

How would the duplicate propeller env you describe be configured?

I know the default deployment charts for a long time just set

replicas: 2

and enabled leader election. Then k8s starts 2 propeller instances automatically and the leader-election mechanism ensures only one is active at a time.

hallowed-mouse-14616

11/10/2023, 1:41 PM

This is only necessary for specific usecases though. Because if there is only a single replica and the pod fails, k8s will recreate the Pod as long as it's defined as a deployment / replica set / etc. So there will be a little downtime while that transition happens and the only downside is that workflows will not be able to schedule new tasks. The transition between leader election is much quicker.

calm-pilot-2010

11/10/2023, 1:57 PM

Thanks, that's really useful information. I think now the flyte-core helm is set to one replica but with rolling updates. I've also just discovered the update strategy is not configurable through the

flyte-core

helm. Probably it would be a trivial helm PR, but I think I will just make the leases and renew periods a bit longer to mitigate our problem.

🙏🏽 1

🙏 1

freezing-airport-6809

11/11/2023, 1:48 AM

Cc @white-chef-57887 can we add this to docs somehow

👍 1

white-chef-57887

11/13/2023, 7:51 PM

yeah, i can make an issue to add to the deployment config docs

9 Views

Open in Slack

Previous Next