Hi all, I'm very happy with the flyte community wh...
# ask-the-community
q
Hi all, I'm very happy with the flyte community which seems to be very active. 🙂 I'm trying to deploy flyte-binary on EKS using the terraform template provided here: https://github.com/unionai-oss/deploy-flyte/tree/main/environments/aws by @David Espejo (and some fixes from @Uria Franko ). The flyte-binary pod seems to be starting and I don't see any critical error in the logs, however the pod is restarted (
CrashLoopBackoff
) because the liveness & readiness probes fail to communicate with it. E.g:
Liveness probe failed: Get "<http://10.3.125.74:8088/healthcheck>": dial tcp 10.3.125.74:8088: connect: connection refused
Has anyone else witnessed the same behavior ? Or may I have misconfigured either the terraform template or the helm chart ?
y
get the logs
-p
?
q
Here are the logs I get with
kubectl logs flyte-binary-b958c88b6-mh2nh -n flyte -c flyte
The thing I'm investigating right now is that I'm having doubts about the values I've put there for OIDC in the helm configuration: https://github.com/flyteorg/flyte/blob/master/charts/flyte-binary/eks-production.yaml#L23-L29
y
-p
?
well
get pod
first… want to see if there are restarts.
and if there are restarts, then
-p
to see the previous logs
q
haaaaaa I understand now why the previous logs are interesting to watch. Smart.
logs
(I forgot to say that there are restarts indeed: since the liveness & readiness probes fail to connect the pod is restarted)
d
@Quentin Chenevier is that the `baseUrl`that you're using? bc that won't work in your environment
q
Nope. I've put the OIDC Connect provider url shown on the cluster page in EKS console.
y
we need to be better about errors
that error is a terminating error…
can you make it so that that error goes away
unf. in the single binary we don’t capture the error and restart. then it would be obvious
the thread just dies
😞
q
Haaaaa so I've misconfigured the thing: OIDC is not found
It works !
Since I'm very new to kube, I'm not very used to digg into logs. Thanks for helping finding the root cause. (and I guess it's time for me to go to sleep now).
y
sure let us know…
we definitely recommend doing auth last
as it’s the trickiest bit to get right
q
What was strange was seeing the pod as
Running
, I thought that if the probes couldn't reach it was due to a networking issue. But indeed the pod wasn't really running.
@Yee Yeah thanks for the tip. I'm learning on the way. 😅
y
yeah we need to add a panic to serve probably and recover in the go routine https://github.com/flyteorg/flyte/blob/e57cac0990fe5ec321e590fc49147014827e6dfd/cmd/single/start.go#L95