We're seeing flytepropeller auth failures after in...
# announcements
s
We're seeing flytepropeller auth failures after installation/upgrade (via helm) with a message like this:
Copy code
E1220 16:10:36.391892       1 workers.go:102] error syncing 'mandant1-development/f3359d6b5cd941830000': Workflow[] failed. ErrorRecordingError: failed to publish event, caused by: EventSinkError: Error sending event, caused by [rpc error: code = Unauthenticated desc = token parse error [JWT_VERIFICATION_FAILED] Could not retrieve id token from metadata, caused by: rpc error: code = Unauthenticated desc = Request unauthenticated with IDToken]
It leaves the system in a weird state because there's no hard failure. flytepropeller ist running after all so our monitoring does not trigger an alarm, but no workflow execution is happening. After a flytepropeller restart, it suddenly starts working again. So after some digging I found out why this happens in our setup:
flyte-secret-auth
is populated with
.Values.secrets.adminOauthClientCredentials.clientSecret
during installation which is set to the placeholder
foobar
in
values.yaml
. We set the secret value dynamically though with a helm hook during installation because we need to fetch the real client-secret from Keycloak. That happens only after flytepropeller is deployed and it seems that flytepropeller does not reload the secret on changes. Since
flyte-secret-auth
is managed by helm, this happens again on every
helm upgrade
. I see mainly two (non exclusive) ways to improve this behavior: • Remove the default
clientSecret
and only create
flyte-secret-auth
via helm if the value is actually set. Only mount
flyte-secret-auth
if external auth is enabled. That would cause flytepropeller to fail to start until
flyte-secret-auth
is created by other means. • Trigger a flytepropeller reload when
flyte-secret-auth
changes. Any thoughts on this? Happy to contribute here but I'd like to discuss the best way forward with you first.
h
Thank you for reporting this! I think 2 makes sense to do... I also think maybe we should add `enabled: false`s to adminOauthClient if someone wants to manage creds completely outside helm
👍 1
s
Thanks for your suggestion @Haytham Abuelfutuh I just created created a PR that adds a new
enabled
option for adminOauthClientCredentials: https://github.com/flyteorg/flyte/pull/1976
👍 1
168 Views