https://flyte.org logo
#ask-the-community
Title
# ask-the-community
t

Thomas Newton

11/10/2023, 12:52 PM
Hello. Could anyone explain the importance of leader election for flytepropeller? We sometimes see flytepropeller restarting due to failures to renew the lease.
Copy code
E1106 17:54:54.406295       1 leaderelection.go:369] Failed to update lock: Put "<https://10.0.0.1:443/apis/coordination.k8s.io/v1/namespaces/infrastructure--helm--flyte/leases/propeller-leader?timeout=2m0s>": context deadline exceeded
I1106 17:54:54.406322       1 leaderelection.go:285] failed to renew lease infrastructure--helm--flyte/propeller-leader: timed out waiting for the condition
{"json":{},"level":"fatal","msg":"Lost leader state. Shutting down.","ts":"2023-11-06T17:54:54Z"}
Given that we are not currently using a shared flyte-propeller, I thinking we may be able to disable leader election entirely?
d

Dan Rammer (hamersaw)

11/10/2023, 1:01 PM
Leader election is used to keep a hot duplicate propeller env. If one fails, then k8s leader election allows the second to pick up very quickly. Only one propeller should be active at a time otherwise they will compete with each other and cause issues in creating duplicate Pods for task executions, updating the FlyteWorkflow CRD simultaneously, etc.
t

Thomas Newton

11/10/2023, 1:04 PM
Thanks. It makes sense that multiple concurrent flytepropellers would cause problems. If I have a non-sharded flyte propeller with only one replica and rolling updates disabled though, then I should anyway never have more than one flyte propeller running simultaneously? How would the duplicate propeller env you describe be configured?
d

Dan Rammer (hamersaw)

11/10/2023, 1:39 PM
then I should anyway never have more than one flyte propeller running simultaneously?
Correct!
How would the duplicate propeller env you describe be configured?
I know the default deployment charts for a long time just set
replicas: 2
and enabled leader election. Then k8s starts 2 propeller instances automatically and the leader-election mechanism ensures only one is active at a time.
This is only necessary for specific usecases though. Because if there is only a single replica and the pod fails, k8s will recreate the Pod as long as it's defined as a deployment / replica set / etc. So there will be a little downtime while that transition happens and the only downside is that workflows will not be able to schedule new tasks. The transition between leader election is much quicker.
t

Thomas Newton

11/10/2023, 1:57 PM
Thanks, that's really useful information. I think now the flyte-core helm is set to one replica but with rolling updates. I've also just discovered the update strategy is not configurable through the
flyte-core
helm. Probably it would be a trivial helm PR, but I think I will just make the leases and renew periods a bit longer to mitigate our problem.
k

Ketan (kumare3)

11/11/2023, 1:48 AM
Cc @Nikki Everett can we add this to docs somehow
n

Nikki Everett

11/13/2023, 7:51 PM
yeah, i can make an issue to add to the deployment config docs