https://flyte.org logo
#ask-the-community
Title
# ask-the-community
e

Ethan Brown

11/29/2023, 3:09 PM
Hi folks -- just watching resources in my cluster and I'm seeing that flytepropeller pods seem to be terminating / restarting due to leader election failures like:
{"json":{},"level":"fatal","msg":"Lost leader state. Shutting down.","ts":"2023-11-29T150420Z"}
Is this common... or indicative of something that could be misbehaving within the cluster? (The cluster is mostly idling)
t

Thomas Newton

11/29/2023, 4:01 PM
I had the same issue. I think it happens because of timeouts on requests to the kube-apiserver. I think the default config is
Copy code
lease-duration: 15s
  renew-deadline: 10s
  retry-period: 2s
So every 15 seconds it makes a request to the kub-apiserver to renew the lease. If it can't get a successful response within 10 seconds then flytepropeller will restart. I changed to using:
Copy code
lease-duration: 120s
        renew-deadline: 110s
        retry-period: 5s
This seems to have helped quite a lot in our usecase.
d

Dan Rammer (hamersaw)

11/29/2023, 5:31 PM
If there are lingering issues here please submit a PR to update the defaults.
t

Thomas Newton

11/29/2023, 5:59 PM
You mean make my configuration default? I assumed the current defaults would be more sensible in most usecases. The downside of my config is that it will create a delay of up to 230s when making new deployments.
e

Ethan Brown

11/29/2023, 6:09 PM
Thanks @Thomas Newton! I happened to catch that thread, but didn't see you had changed the polling frequency -- that's super useful.