Hi all I m very happy with the flyte community which seems t Flyte #flyte-support

Hi all, I'm very happy with the flyte community wh...

rough-sugar-4818

09/27/2023, 8:37 PM

Hi all, I'm very happy with the flyte community which seems to be very active. 🙂 I'm trying to deploy flyte-binary on EKS using the terraform template provided here: https://github.com/unionai-oss/deploy-flyte/tree/main/environments/aws by @cold-lock-43986 (and some fixes from @hallowed-autumn-63270 ). The flyte-binary pod seems to be starting and I don't see any critical error in the logs, however the pod is restarted (

CrashLoopBackoff

) because the liveness & readiness probes fail to communicate with it. E.g:

Liveness probe failed: Get "<http://10.3.125.74:8088/healthcheck>": dial tcp 10.3.125.74:8088: connect: connection refused

Has anyone else witnessed the same behavior ? Or may I have misconfigured either the terraform template or the helm chart ?

thankful-minister-83577

09/27/2023, 8:55 PM

get the logs

-p

rough-sugar-4818

09/27/2023, 8:59 PM

Here are the logs I get with

kubectl logs flyte-binary-b958c88b6-mh2nh -n flyte -c flyte

logs

rough-sugar-4818

09/27/2023, 9:02 PM

The thing I'm investigating right now is that I'm having doubts about the values I've put there for OIDC in the helm configuration: https://github.com/flyteorg/flyte/blob/master/charts/flyte-binary/eks-production.yaml#L23-L29

thankful-minister-83577

09/27/2023, 9:04 PM

-p

thankful-minister-83577

09/27/2023, 9:05 PM

well

get pod

first… want to see if there are restarts.

thankful-minister-83577

09/27/2023, 9:05 PM

and if there are restarts, then

-p

to see the previous logs

rough-sugar-4818

09/27/2023, 9:06 PM

haaaaaa I understand now why the previous logs are interesting to watch. Smart.

rough-sugar-4818

09/27/2023, 9:07 PM

logs

rough-sugar-4818

09/27/2023, 9:13 PM

(I forgot to say that there are restarts indeed: since the liveness & readiness probes fail to connect the pod is restarted)

average-finland-92144

09/27/2023, 9:18 PM

@rough-sugar-4818 is that the `baseUrl`that you're using? bc that won't work in your environment

rough-sugar-4818

09/27/2023, 9:20 PM

Nope. I've put the OIDC Connect provider url shown on the cluster page in EKS console.

thankful-minister-83577

09/27/2023, 9:20 PM

we need to be better about errors

thankful-minister-83577

09/27/2023, 9:20 PM

https://github.com/flyteorg/flyteadmin/blob/af81751b0718a5eb55a7fa9a13c6ff7e8efbd4e7/pkg/server/service.go#L314

thankful-minister-83577

09/27/2023, 9:20 PM

that error is a terminating error…

thankful-minister-83577

09/27/2023, 9:21 PM

can you make it so that that error goes away

thankful-minister-83577

09/27/2023, 9:21 PM

unf. in the single binary we don’t capture the error and restart. then it would be obvious

thankful-minister-83577

09/27/2023, 9:21 PM

the thread just dies

thankful-minister-83577

09/27/2023, 9:21 PM

😞

rough-sugar-4818

09/27/2023, 9:22 PM

Haaaaa so I've misconfigured the thing: OIDC is not found

rough-sugar-4818

09/27/2023, 9:24 PM

It works !

rough-sugar-4818

09/27/2023, 9:26 PM

Since I'm very new to kube, I'm not very used to digg into logs. Thanks for helping finding the root cause. (and I guess it's time for me to go to sleep now).

thankful-minister-83577

09/27/2023, 9:27 PM

sure let us know…

thankful-minister-83577

09/27/2023, 9:28 PM

we definitely recommend doing auth last

thankful-minister-83577

09/27/2023, 9:28 PM

as it’s the trickiest bit to get right

rough-sugar-4818

09/27/2023, 9:28 PM

What was strange was seeing the pod as

Running

, I thought that if the probes couldn't reach it was due to a networking issue. But indeed the pod wasn't really running.

rough-sugar-4818

09/27/2023, 9:29 PM

@thankful-minister-83577 Yeah thanks for the tip. I'm learning on the way. 😅

thankful-minister-83577

09/27/2023, 9:30 PM

yeah we need to add a panic to serve probably and recover in the go routine https://github.com/flyteorg/flyte/blob/e57cac0990fe5ec321e590fc49147014827e6dfd/cmd/single/start.go#L95

2 Views

Open in Slack

Previous Next