Hello. I just noticed some undesirable behavior in...
# flyte-support
c
Hello. I just noticed some undesirable behavior in our non-production environment and was wondering if it could be improved. I was messing with the enabled plugins for propeller and pushed a bad config. I noticed the new prod come up, pass health checks, and then the old pod went down. However it looks like after the pod passed health checks it then got into a crash loop while loading a plugin which was misconfigured. It seems like the readiness checks should block until the plugin initialization sequence is complete (at least). Our rolling upgrade resulted in a single crash looping pod and no flyte propeller functionality.
Copy code
{"json":{},"level":"panic","msg":"failed to load plugin - ray-armada: [PluginInitializationFailed] Failed to create gRPC connection, caused by: failed to exit idle mode: passthrough: received empty target in Build()","ts":"2025-01-30T22:26:35Z"}
panic: (*logrus.Entry) 0xc00026a150

goroutine 468 [running]:
<http://github.com/sirupsen/logrus.(*Entry).log(0xc00026a070|github.com/sirupsen/logrus.(*Entry).log(0xc00026a070>, 0x0, {0xc0001b0180, 0xb5})
        /go/pkg/mod/github.com/sirupsen/logrus@v1.9.3/entry.go:260 +0x491
<http://github.com/sirupsen/logrus.(*Entry).Log(0xc00026a070|github.com/sirupsen/logrus.(*Entry).Log(0xc00026a070>, 0x0, {0xc00148a080?, 0x0?, 0x0?})
        /go/pkg/mod/github.com/sirupsen/logrus@v1.9.3/entry.go:304 +0x48
<http://github.com/sirupsen/logrus.(*Entry).Panic(0x34e6ff8|github.com/sirupsen/logrus.(*Entry).Panic(0x34e6ff8>?, {0xc00148a080?, 0x26a3020?, 0xc000018701?})
        /go/pkg/mod/github.com/sirupsen/logrus@v1.9.3/entry.go:342 +0x25
<http://github.com/flyteorg/flyte/flytestdlib/logger.Panic({0x34e6ff8|github.com/flyteorg/flyte/flytestdlib/logger.Panic({0x34e6ff8>?, 0xc0000d2140?}, {0xc00148a080, 0x1, 0x1})
        /go/src/github.com/flyteorg/flytestdlib/logger/logger.go:144 +0x47
<http://github.com/flyteorg/flyte/flytepropeller/pkg/controller.(*Controller).onStartedLeading.func1()|github.com/flyteorg/flyte/flytepropeller/pkg/controller.(*Controller).onStartedLeading.func1()>
        /go/src/github.com/flyteorg/flytepropeller/pkg/controller/controller.go:135 +0x99
created by <http://github.com/flyteorg/flyte/flytepropeller/pkg/controller.(*Controller).onStartedLeading|github.com/flyteorg/flyte/flytepropeller/pkg/controller.(*Controller).onStartedLeading> in goroutine 464
        /go/src/github.com/flyteorg/flytepropeller/pkg/controller/controller.go:133 +0xb5
Ah, flytepropeller has no readiness/liveness checks so nothing stops it from deploying a broken binary/config 😕
Seems like something that could be improved. I'll file an issue.
f
Yes 👍
c