Hi team.. our Flyte deploy is acting up today. Fly...
# ask-the-community
r
Hi team.. our Flyte deploy is acting up today. Flyteadmin and Flytescheduler pods keep going down randomly. We’ve not touched the deployment itself, hence quite confused about how to root cause this. In the scheduler logs we see this
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.20.71.119:81: connect: connection refused"
Not sure what component it’s trying to connect to… and not sure what’s causing the flakiness… Replicas are starting and stopping somewhat arbitrarily. Any insight would be appreciated
d
HI @Rupsha Chaudhuri Is it possible to identify to which component is
172.20.71.119
assigned to? Probably if you issue
kubectl get service
kubectl get endpoints
r
ok.. let me try that
j
I have same error.
☺ kubectl get service -n flyte
NAME        TYPE    CLUSTER-IP   EXTERNAL-IP  PORT(S)        AGE
datacatalog     ClusterIP  10.0.208.201  <none>    88/TCP,89/TCP     15h
flyte-pod-webhook  ClusterIP  10.0.72.200  <none>    443/TCP        15h
flyteadmin     ClusterIP  10.0.221.40  <none>    87/TCP,80/TCP,81/TCP  15h
flyteconsole    ClusterIP  10.0.36.237  <none>    80/TCP         15h
minio        ClusterIP  10.0.102.150  <none>    9000/TCP,9001/TCP   15h
minio-direct    NodePort  10.0.141.140  <none>    9000:30084/TCP     15h
postgres      ClusterIP  10.0.63.11   <none>    5432/TCP        15h
postgres-direct   NodePort  10.0.111.195  <none>    5432:30083/TCP     15h
☺ kubectl get endpoints -n flyte
NAME        ENDPOINTS                        AGE
datacatalog     10.224.2.126:8089,10.224.2.126:8088           15h
flyte-pod-webhook  10.224.16.102:9443                   15h
flyteadmin     10.224.0.248:8089,10.224.0.248:8088,10.224.0.248:8087  15h
flyteconsole    10.224.31.209:8080                   15h
minio        10.224.6.229:9000,10.224.6.229:9001           15h
minio-direct    10.224.6.229:9000                    15h
postgres      10.224.9.110:5432                    15h
postgres-direct   10.224.9.110:5432                    15h
Using config file:  [/etc/flyte/config/admin.yaml /etc/flyte/config/db.yaml /etc/flyte/config/logger.yaml]
{"json":{"src":"client.go:181"},"level":"error","msg":"failed to initialize token source provider. Err: failed to fetch auth metadata. Error: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 10.0.221.40:81: connect: connection refused\"","ts":"2023-03-08T01:03:59Z"}
{"json":{"src":"client.go:186"},"level":"warning","msg":"Starting an unauthenticated client because: can't create authenticated channel without a TokenSourceProvider","ts":"2023-03-08T01:03:59Z"}
{"json":{"src":"client.go:65"},"level":"info","msg":"Initialized Admin client","ts":"2023-03-08T01:03:59Z"}
{"json":{"src":"precheck.go:32"},"level":"error","msg":"Attempt failed due to rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 10.0.221.40:81: connect: connection refused\"","ts":"2023-03-08T01:03:59Z"}
{"json":{"src":"client.go:65"},"level":"info","msg":"Initialized Admin client","ts":"2023-03-08T01:04:00Z"}
{"json":{"src":"precheck.go:32"},"level":"error","msg":"Attempt failed due to rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 10.0.221.40:81: connect: connection refused\"","ts":"2023-03-08T01:04:01Z"}
100 Views