I'm running flyte on GKE with nginx ingress, and a...
# flyte-on-gcp
m
I'm running flyte on GKE with nginx ingress, and am currently migrating auth to Okta (I previously had auth working with Auth0). I followed all the steps in the Authentication setup for setting up flyte-core external auth server with Okta. Console auth + flytectl are working, and all pods are running without except for flytescheduler, which gives a somewhat ambiguous error in the flytescheduler-check init container:
Copy code
panic: rpc error: code = Unauthenticated desc = token parse error [JWT_VERIFICATION_FAILED] Could not retrieve id token from metadata, caused by: rpc error: code = Unauthenticated desc = Request unauthenticated with IDToken
I've triple checked the flytepropeller setup for auth with client credentials in Okta and haven't been able to resolve the issue. I figured i'd ask if anyone has any pointers for troubleshooting this. Thanks!
d
@Mark Waylonis I think it has to do with a recent change where a conditional was missing. This was fixed here but not yet on a release
m
In my case, I have the secrets managed by helm
Copy code
adminOauthClientCredentials:
    enabled: true
So i don't think this change should impact my configuration (I may be misunderstanding something)? I don't have the same issue that PR links to. I've also confirmed that the
flyte-secret-auth
secret is created correctly with the correct flytpropeller
client_secret
, and it looks like the init container also has it mounted from describing the flytescheduler pod
Copy code
Mounts:
      /etc/db from db-pass (rw)
      /etc/flyte/config from config-volume (rw)
      /etc/secrets/ from auth (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-m4qwb (ro)
d
is the above amount from the
flytescheduler
Pod?
m
yes
from the flytescheduler-check init container for that pod
d
typically there's no need to, but have you tried a manual restart of the
flytescheduler
deployment?
kubectl rollout restart deployment/flytescheduler -n <your-namespace>
m
Yes. It doesn't seem like the error is in the secret mounting, but rather in the actual auth?
i'm going to get flyte dev environment set up to see if i can get a more useful error message logged. My hunch is it might be an issue with the Okta configuration based on the information i have right now
d
thanks for sharing and sorry if we're not helping enough. What that init container does is basically test its connection to dependencies (DB, flyteadmin). Somehow is failing the auth to flyteadmin
m
Oh, no worries. I really appreciate you taking a look. I wanted to see if there was an easy fix before i dive in to more troubleshooting. This is a good excuse to get the dev environment set up, which i've been meaning to do for a while simple smile
I was able to get to the bottom of this. It was due to a mismatch between the audience expected by flyteadmin and the aud in the access token. flyteadmin uses this function https://github.com/flyteorg/flyte/blob/cf7c638497224aa60912967df529182b4c41b5d5/flyteadmin/auth/handler_utils.go#L114 to determine the audience, which was determining the audience from the request. In my case, i am running flytepropeller/flytescheduler in the same cluster as the control plane, so they are sending requests to flyteadmin via
<http://flyteadmin:80>
. Therefore when validating the JWT audience, it expects
<http://flyteadmin:80>
to be the audience instead of the external-facing domain name. This was easily fixed by setting the
allowedAudience
in the values file, which added external-facing domain name to the expected audiences.
d
@Mark Waylonis thanks for sharing! So you had to set your endpoint address as the
allowedAudience
?
m
yes
Copy code
appAuth:
  authServerType: External
  externalAuthServer:
    allowedAudience: [<https://flyte>.<DOMAIN>.com]
Want me to open a PR for updating the okta flyte-core documentation for external auth? From my understanding this setting won't adversely affect other topologies such as multi cluster
d
yes, let's do it!
m
ok cool, i'll follow up with a PR later today