Hi folks -- just watching resources in my cluster ...
# ask-the-community
e
Hi folks -- just watching resources in my cluster and I'm seeing that flytepropeller pods seem to be terminating / restarting due to leader election failures like:
{"json":{},"level":"fatal","msg":"Lost leader state. Shutting down.","ts":"2023-11-29T150420Z"}
Is this common... or indicative of something that could be misbehaving within the cluster? (The cluster is mostly idling)
t
I had the same issue. I think it happens because of timeouts on requests to the kube-apiserver. I think the default config is
Copy code
lease-duration: 15s
  renew-deadline: 10s
  retry-period: 2s
So every 15 seconds it makes a request to the kub-apiserver to renew the lease. If it can't get a successful response within 10 seconds then flytepropeller will restart. I changed to using:
Copy code
lease-duration: 120s
        renew-deadline: 110s
        retry-period: 5s
This seems to have helped quite a lot in our usecase.
d
If there are lingering issues here please submit a PR to update the defaults.
t
You mean make my configuration default? I assumed the current defaults would be more sensible in most usecases. The downside of my config is that it will create a delay of up to 230s when making new deployments.
e
Thanks @Thomas Newton! I happened to catch that thread, but didn't see you had changed the polling frequency -- that's super useful.