Hi folks just watching resources in my cluster and I m seein Flyte #flyte-support

Hi folks -- just watching resources in my cluster ...

gorgeous-waitress-5026

11/29/2023, 3:09 PM

Hi folks -- just watching resources in my cluster and I'm seeing that flytepropeller pods seem to be terminating / restarting due to leader election failures like:

{"json":{},"level":"fatal","msg":"Lost leader state. Shutting down.","ts":"2023-11-29T150420Z"}

Is this common... or indicative of something that could be misbehaving within the cluster? (The cluster is mostly idling)

calm-pilot-2010

11/29/2023, 4:01 PM

I had the same issue. I think it happens because of timeouts on requests to the kube-apiserver. I think the default config is

Copy code

lease-duration: 15s
  renew-deadline: 10s
  retry-period: 2s

So every 15 seconds it makes a request to the kub-apiserver to renew the lease. If it can't get a successful response within 10 seconds then flytepropeller will restart. I changed to using:

Copy code

lease-duration: 120s
        renew-deadline: 110s
        retry-period: 5s

This seems to have helped quite a lot in our usecase.

gratitude thank you 1

calm-pilot-2010

11/29/2023, 4:02 PM

https://flyte-org.slack.com/archives/CP2HDHKE1/p1699620779742009

hallowed-mouse-14616

11/29/2023, 5:31 PM

If there are lingering issues here please submit a PR to update the defaults.

calm-pilot-2010

11/29/2023, 5:59 PM

You mean make my configuration default? I assumed the current defaults would be more sensible in most usecases. The downside of my config is that it will create a delay of up to 230s when making new deployments.

gorgeous-waitress-5026

11/29/2023, 6:09 PM

Thanks @calm-pilot-2010! I happened to catch that thread, but didn't see you had changed the polling frequency -- that's super useful.

Open in Slack

Previous Next