Hi Flyte community! We're currently trying to depl...
# flyte-deployment
w
Hi Flyte community! We're currently trying to deploy Flyte using the multicluster setup in GCP. We're using Okta as an external auth system (followed docs here). Our flytescheduler is in a CrashLoopBackOff. The logs show: "Request failed due to [rpc error: code = Unauthenticated desc = token parse error [JWT_VERIFICATION_FAILED] Could not retrieve id token from metadata, caused by: rpc error: code = Unauthenticated desc = Request unauthenticated with IDToken]. If it's an unauthenticated error, we will attempt to establish an authenticated context." It looks like flyteadmin uses the domain from the request to validate the JWT audience. It looks like the default behavior is for flytescheduler to use "flyteadmin:80", which flyteadmin then expects to be the audience in the JWT it receives from Okta, which is not the case. We would expect the "allowedAudience" field in the externalAuth section of helm values to be used in the audience check, but this does not seem to be the case.
a
could you share what you're using in your Helm dataplane values? By default Flyte expects everything to run inside the same K8s cluster, so
flyteadmin
is reachable. From outside the control plane cluster, data plane clusters should use the full FQDN/Ingress host name the client would use to connect to the cluster (like
mycluster.mydomain:443
) This is what this section of the docs mentions as the
endpoint
field.
That FQDN happens to be the same you'd use as the
authorizedUri
in the auth setup
c
There aren't any problems with the dataplane components authenticating. The flytescheduler component, which also runs in the same control plane cluster, is having the issue authenticating to flyteadmin. It is using the default,
flyteadmin:80
to reach flyteadmin.
a
right, I remembered the
flytescheduler
error later on could you share the Helm values you're using for the control plane cluster?
w
Copy code
configmap:
  admin:
    admin:
      audience: <https://flyte>.<BASE_URL>
      clientId: '{{ .Values.secrets.adminOauthClientCredentials.clientId }}'
      clientSecretLocation: /etc/secrets/client_secret
      endpoint: flyteadmin:81
      insecure: true
    event:
      capacity: 1000
      rate: 500
      type: admin
  adminServer:
    auth:
      appAuth:
        authServerType: External
        thirdPartyConfig:
          flyteClient:
            clientId: <OKTA_CLIENT_ID>
            redirectUri: <http://localhost:53593/callback>
            scopes:
            - offline
            - all
      authorizedUris:
      - <https://flyte>.<BASE_URL>
      - <http://flyteadmin:80>
      - <http://flyteadmin.flyte.svc.cluster.local:80>
      externalAuthServer:
        allowedAudience:
        - <https://flyte>.<BASE_URL>
        - <http://flyteadmin:80>
        baseUrl: https://<COMPANY>.okta.com/oauth2/<SOME_ID>
        metadataUrl: .well-known/openid-configuration
      userAuth:
        openId:
          baseUrl: https://<COMPANY>.okta.com/oauth2/<SOME_ID>
          clientId: <OKTA_CLIENT_ID>
          metadataUrl: .well-known/oauth-authorization-server
          scopes:
          - profile
          - openid
          - offline_access
c
flytescheduler section is:
Copy code
flytescheduler:
  runPrecheck: true
w
Here's the flytescheduler settings:
Copy code
flytescheduler:
  additionalContainers: []
  additionalVolumeMounts: []
  additionalVolumes: []
  affinity: {}
  configPath: /etc/flyte/config/*.yaml
  image:
    pullPolicy: IfNotPresent
    repository: <http://cr.flyte.org/flyteorg/flytescheduler-release|cr.flyte.org/flyteorg/flytescheduler-release>
    tag: v1.13.1
  nodeSelector: {}
  podAnnotations: {}
  podEnv: {}
  podLabels: {}
  priorityClassName: ""
  resources:
    limits:
      cpu: 250m
      ephemeral-storage: 100Mi
      memory: 500Mi
    requests:
      cpu: 10m
      ephemeral-storage: 50Mi
      memory: 50Mi
  runPrecheck: true
  secrets: {}
  securityContext:
    fsGroup: 65534
    fsGroupChangePolicy: Always
    runAsNonRoot: true
    runAsUser: 1001
    seLinuxOptions:
      type: spc_t
  serviceAccount:
    annotations:
      <http://iam.gke.io/gcp-service-account|iam.gke.io/gcp-service-account>: <MY_SERVICE_ACCOUNT>
    create: true
    imagePullSecrets: []
  tolerations: []
a
what if you add
useAudienceFromAdmin: true
to the
admin
block?
w
Hi David! We tried using the
useAudienceFromAdmin: true
. But we’re still seeing errors in the admin logs. I checked the flytescheduler config map and the update was indeed applied.
Copy code
$ kubectl logs flyteadmin
{"json":{"src":"handlers.go:309"},"level":"info","msg":"Failed to parse Access Token from context. Will attempt to find IDToken. Error: invalid audience [&{[<https://flyte>.<BASE_URL>] https://<COMPANY>.<http://okta.com/oauth2/<SOME_ID|okta.com/oauth2/<SOME_ID>> <SOME_ID> 2025-01-23 22:12:54 +0000 UTC 2025-01-22 22:12:54 +0000 UTC 0001-01-01 00:00:00 +0000 UTC AT.M-<SOME_ID>}], wanted [map[<<http://flyteadmin:80>>:{}]]","ts":"2025-01-22T22:12:54Z"}
{"json":{"src":"token.go:100"},"level":"debug","msg":"Could not retrieve id token from metadata rpc error: code = Unauthenticated desc = Request unauthenticated with IDToken","ts":"2025-01-22T22:12:54Z"}
a
can you remove the
audience
from the
admin
block?
w
Now my flyteadmin pod is in a CrashLoopBackoff
Copy code
panic: [AUTH_CONTEXT_SETUP_FAILED] Error creating OAuth2 library configuration, caused by: secrets not found - file [/etc/secrets/oidc_client_secret], Env [FLYTE_SECRET_oidc_client_secret]
a
And is there a
client_secret
key in your
flyteadmin-secrets
Secret?
w
Copy code
$ kubectl get secret flyte-admin-secrets -o yaml
apiVersion: v1
data:
  claim_symmetric_key: <SOME_VALUES>==
  cookie_block_key: <SOME_VALUES>==
  cookie_hash_key: <SOME_VALUES>=
  token_rsa_key.pem: <SOME_VALUES>=
kind: Secret
metadata:
  annotations:
    <http://meta.helm.sh/release-name|meta.helm.sh/release-name>: flyte
    <http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: flyte
  creationTimestamp: "2024-03-15T22:02:22Z"
  labels:
    <http://app.kubernetes.io/managed-by|app.kubernetes.io/managed-by>: Helm
  name: flyte-admin-secrets
  namespace: flyte
  resourceVersion: "684387334"
  uid: <SOME_ID>
type: Opaque
Ok for some reason the client secret got deleted when I was updating the helm charts. I added them back with
Copy code
kubectl edit secret -n <flyte-namespace> flyte-admin-secrets
now the flyteadmin is running
but I am still seeing
Copy code
$ kubectl logs flyteadmin
{"json":{"src":"handlers.go:309"},"level":"info","msg":"Failed to parse Access Token from context. Will attempt to find IDToken. Error: invalid audience [&{[<https://flyte>.<BASE_URL>] https://<COMPANY>.<http://okta.com/oauth2/<SOME_ID|okta.com/oauth2/<SOME_ID>> <SOME_ID> 2025-01-23 22:12:54 +0000 UTC 2025-01-22 22:12:54 +0000 UTC 0001-01-01 00:00:00 +0000 UTC AT.M-<SOME_ID>}], wanted [map[<<http://flyteadmin:80>>:{}]]","ts":"2025-01-22T22:12:54Z"}
{"json":{"src":"token.go:100"},"level":"debug","msg":"Could not retrieve id token from metadata rpc error: code = Unauthenticated desc = Request unauthenticated with IDToken","ts":"2025-01-22T22:12:54Z"}
a
seems like it expects a difference audience. what's the audience you set up in Okta? what if you remove the
allowedAudience
list altogether? This is an optional field and should default to the full path of the resource server a.k.a. the Flyte URL
w
Figured it out! We were passing in 3 authorizedURIs so I uncommented out two of them. https://github.com/flyteorg/flyte/blob/8125ae1b96b4b41e237be396c2e0dd21141117ce/flyteadmin/auth/handler_utils.go#L114 -> If the request has a URL attached to it this function will just return the first URI in the AuthorizedURIs list which turned out flyteadmin:80.
Copy code
authorizedUris:
  - <https://flyte>.<BASE_URL>
  # - <http://flyteadmin:80>
  # - <http://flyteadmin.flyte.svc.cluster.local:80>
Should I file a github issue around this? It seems like this GetPublicURL function should loop through the authorizedURIs to find the most relevant URL based on the URL from the request.
a
@worried-airplane-87065 hmm looks like the logic is to pick up the one that matches with the host
which in turn is the URI configured in your IdP, so in this case it should have picked up only the first one. Yes, please raise an issue to investigate better
w