Hi, we're trying to deploy flyte via the flyte-bin...
# ask-the-community
l
Hi, we're trying to deploy flyte via the flyte-binary helm chart in an EKS cluster. So far, we connected to our RDS database and tried to connect to our Keycloak instance and now we're getting a fatal error:
Flyte native scheduler failed to start due to async future was canceled
. The
oidc_client_secret
and
client_secret
are written to secret and used via
clientSecretsExternalSecretRef
. We can't see any relevant logs in Keycloak, so the requests don't seem to arrive. When I disable auth, everything seems to be working well, I've also tested the client credentials for the client and they are working fine (and I see the manual requests in the Keycloak logs). Here is the relevant section in the values:
Copy code
auth:
  enabled: true
  enableAuthServer: true
  oidc:
    baseUrl: keycloak-svc.keycloak/auth/realms/test
    clientId: flyte
  authorizedUris:
    - <https://flyte.example.com>
  internal:
    clientSecretHash: {{hash of client secret in client_secrets}}
  clientSecretsExternalSecretRef: client-secrets
I found this this thread in the flyte forum, but the suggestions aren't helping so far (haven't found the traefik settings for proxy buffer size).
d
@Luis Dunkum There's a need to update auth docs for Keycloak. But in order to confirm, I wanted to see what's the behaviour in your env when: • You use
https://<keycloak-url>/realms/<keycloak-realm>
as the structure for the baseUrl • Also, what are you using for Ingress? What do you plan to use for SSL?
l
• I added
https://
to the baseUrl and the message in question disappears, but I can't see any flyte logs in Keycloak and the pod is still failing, so there may be some other problem as well. • We are using traefik for ingress and SSL. Once I disable auth, I can navigate to the dashboard just fine and also connect via CLI. We checked flyte-the-hard-way, but sadly there are lots of differences in the stack used, so some things aren't that relevant to us.
Ah sorry, I think I was mistaken, the error message disappeared but apparently only because Keycloak can't be reached now, see here:
Copy code
Error creating auth context [AUTH_CONTEXT_SETUP_FAILED] Error creating oidc provider w/ issuer [<https://keycloak-svc.keycloak/auth/realms/test>], caused by: Get \"<https://keycloak-svc.keycloak/auth/realms/test/.well-known/openid-configuration>\": context deadline exceeded
d
@Luis Dunkum apparently in recent versions of Keycloak, the baseURL doesn't use
/auth
anymore. That's what I mean with using this structure:
https://<keycloak-url>/realms/<keycloak-realm>
It comes from this thread https://flyte-org.slack.com/archives/CP2HDHKE1/p1693218949341099
l
Aah, sorry, my mistake, I'll take a look!
Hmm, it seems as if there is still a problem with the connection. There are some more log entries, but I still can't see any requests for the flyte client in Keycloak.
Copy code
flyte {"json":{"src":"client.go:63"},"level":"info","msg":"Initialized Admin client","ts":"2023-11-15T10:54:55Z"}
flyte {"json":{"src":"start.go:54"},"level":"info","msg":"Successfully initialized a native flyte scheduler","ts":"2023-11-15T10:54:55Z"}
flyte {"json":{"src":"schedule_executor.go:38"},"level":"info","msg":"Flyte native scheduler started successfully","ts":"2023-11-15T10:54:55Z"}
flyte {"json":{"src":"schedule_executor.go:58"},"level":"info","msg":"Number of schedules retrieved 0","ts":"2023-11-15T10:54:55Z"}
flyte {"json":{"src":"schedule_executor.go:88"},"level":"error","msg":"failed to get future value for catchup due to async future was canceled","ts":"2023-11-15T10:54:55Z"}
flyte {"json":{"src":"schedule_executor.go:89"},"level":"info","msg":"Flyte native scheduler shutdown","ts":"2023-11-15T10:54:55Z"}
flyte {"json":{"src":"start.go:58"},"level":"fatal","msg":"Flyte native scheduler failed to start due to async future was canceled","ts":"2023-11-15T10:54:55Z"}
d
anything interesting in the Traefik logs?
l
I doubt it, since this is all still cluster internal traffic, as
baseUrl
points to the Keycloak service inside the cluster, but I'll check