https://flyte.org logo
#ask-the-community
Title
# ask-the-community
s

Stephen Fromm

07/28/2023, 11:44 AM
Flyteconsole is now giving "502 Bad Gateway" errors. How do I interrogate Flyte to find out what the problem is? And what are solutions to likely causes? I figure something that runs the process for that functionality is dead, and I don't know how to query whether it's dead, nor how to restart it. (FWIW, it's running on top of eks at AWS.) (Sorry for the naive question. My team has been using Flyte for the past year or so. Unfortunately the leading engineers who introduced Flyte have left our company, and I know only a little about it. This isn't a prod cluster, so it's not 100% important that I get this right.)
f

Fabio Grätz

07/28/2023, 1:54 PM
Can you pls connect to the k8s cluster and post the result of
kubectl -n flyte get pods
?
s

Stephen Fromm

07/28/2023, 1:57 PM
I get
No resources found in flyte namespace.
Though (again, I don't know much about these things) there's stuff that looks flyte-ish:
Copy code
NAMESPACE        NAME                           READY  STATUS  RESTARTS     AGE
<snipped irrelevant content>
great-falls       flyte-pod-webhook-67b567f698-pbkg7            1/1   Running  0        170d
great-falls       flyteadmin-6bfb478fdf-6hq8m               1/1   Running  0        176d
great-falls       flyteadmin-6bfb478fdf-fw4z8               1/1   Running  0        176d
great-falls       flyteconsole-66d7468bf9-99tkk              1/1   Running  27 (9d ago)   176d
great-falls       flyteconsole-66d7468bf9-lj82g              1/1   Running  24 (3d17h ago)  176d
great-falls       flytepropeller-c888bf854-zpcgx              1/1   Running  0        176d
great-falls       flytescheduler-58dcdf9b96-8t4wd             1/1   Running  0        170d
f

Fabio Grätz

07/28/2023, 1:58 PM
Ah ok not running in the default namespace.
When you do
kubectl get pods --all-namespaces
is ther anything that is not running? Like completed, or failed
s

Stephen Fromm

07/28/2023, 1:58 PM
Copy code
$ kubectl -n great-falls get pods
NAME                 READY  STATUS  RESTARTS     AGE
datacatalog-c485d95d4-75vgn     1/1   Running  0        176d
datacatalog-c485d95d4-pbkfk     1/1   Running  0        176d
flyte-pod-webhook-67b567f698-pbkg7  1/1   Running  0        170d
flyteadmin-6bfb478fdf-6hq8m     1/1   Running  0        176d
flyteadmin-6bfb478fdf-fw4z8     1/1   Running  0        176d
flyteconsole-66d7468bf9-99tkk    1/1   Running  27 (9d ago)   176d
flyteconsole-66d7468bf9-lj82g    1/1   Running  24 (3d17h ago)  176d
flytepropeller-c888bf854-zpcgx    1/1   Running  0        176d
flytescheduler-58dcdf9b96-8t4wd   1/1   Running  0        170d
syncresources-5d9df9b699-gjdqx    1/1   Running  0        170d
f

Fabio Grätz

07/28/2023, 1:58 PM
or crashloopbackoff
Some restarts in flyte console but a few days old and seems to be running now
s

Stephen Fromm

07/28/2023, 1:59 PM
Everything is
Running
, except there's some prometheus stuff that is
Pending
(which doesn't seem relevant)
No, I'm still getting
502 Bad Gateway
. Was working yesterday.
f

Fabio Grätz

07/28/2023, 2:21 PM
Are there any nginx pods in your namespaces?
Or do you know how flyte console is exposed?
Does the error when accessing flyte console through the browser say anything about nginx?
s

Stephen Fromm

07/28/2023, 2:23 PM
I don't see anything (in the output of
get pods
or in the error message, or in the local documentation at my company) about nginx.
f

Fabio Grätz

07/28/2023, 2:24 PM
any other ingress? How do you access flyte console?
Is it exposed in the public internet? Or local network?
s

Stephen Fromm

07/28/2023, 2:24 PM
We use VPN
Under
kubectl describe ep
, I see some things which are a little suspicious:
Copy code
Annotations: <http://endpoints.kubernetes.io/last-change-trigger-time|endpoints.kubernetes.io/last-change-trigger-time>: 2023-07-27T17:52:46Z
and
Copy code
Subsets:
 Addresses:     <none>
 NotReadyAddresses: 5.0.12.118
 Ports:
  Name Port Protocol
  ---- ---- --------
  http 7979 TCP
Those are both under
Name:     external-dns
f

Fabio Grätz

07/28/2023, 2:58 PM
What’s the name of the corresponding service?
s

Stephen Fromm

07/28/2023, 3:01 PM
Not sure. Only thing I see associated is
external-dns
. Is there a command I should type to get that?
f

Fabio Grätz

07/28/2023, 3:02 PM
Mh I still don’t fully understand how your deployment looks like.
What is the output of
kubectl get service -n great-falls
?
Or also in all namespaces?
There must be some kind of ingress or reverse proxy I’d say.
s

Stephen Fromm

07/28/2023, 3:03 PM
Copy code
$ kubectl get service -n great-falls
NAME        TYPE      CLUSTER-IP    EXTERNAL-IP                                PORT(S)                         AGE
datacatalog     LoadBalancer  10.100.244.46  <http://a2491ed93a79c4e4c9c3659352ba7ad5-1661450188.us-east-1.elb.amazonaws.com|a2491ed93a79c4e4c9c3659352ba7ad5-1661450188.us-east-1.elb.amazonaws.com>  8089:32312/TCP,88:31369/TCP,89:32734/TCP         176d
flyte-pod-webhook  ClusterIP   10.100.53.162  <none>                                  443/TCP                         176d
flyteadmin     LoadBalancer  10.100.205.238  <http://ab6eeae927cb148ea831f9a0482954f0-911285910.us-east-1.elb.amazonaws.com|ab6eeae927cb148ea831f9a0482954f0-911285910.us-east-1.elb.amazonaws.com>  80:31117/TCP,81:30601/TCP,87:30822/TCP,10254:31187/TCP  176d
flyteconsole    LoadBalancer  10.100.150.24  <http://a1f320958d4844bd081a46e0c3fd4485-1851160411.us-east-1.elb.amazonaws.com|a1f320958d4844bd081a46e0c3fd4485-1851160411.us-east-1.elb.amazonaws.com>  80:30722/TCP                       176d
Copy code
$ kubectl get service
NAME      TYPE    CLUSTER-IP    EXTERNAL-IP  PORT(S)  AGE
external-dns  ClusterIP  10.100.185.214  <none>    7979/TCP  177d
kubernetes   ClusterIP  10.100.0.1    <none>    443/TCP  177d
f

Fabio Grätz

07/28/2023, 3:05 PM
(Are these IPs all reachable only from within the VPN? Otherwise maybe redact them)
s

Stephen Fromm

07/28/2023, 3:05 PM
Copy code
$ kubectl get service -A | grep -v prometheus
NAMESPACE        NAME                         TYPE      CLUSTER-IP    EXTERNAL-IP                                PORT(S)                         AGE
actions-runner-system  actions-runner-controller-metrics-service      ClusterIP   10.100.186.11  <none>                                  8443/TCP                         136d
actions-runner-system  actions-runner-controller-webhook          ClusterIP   10.100.193.93  <none>                                  443/TCP                         136d
cert-manager      cert-manager                     ClusterIP   10.100.159.173  <none>                                  9402/TCP                         136d
cert-manager      cert-manager-webhook                 ClusterIP   10.100.183.161  <none>                                  443/TCP                         136d
default         external-dns                     ClusterIP   10.100.185.214  <none>                                  7979/TCP                         177d
default         kubernetes                      ClusterIP   10.100.0.1    <none>                                  443/TCP                         177d
great-falls       datacatalog                     LoadBalancer  10.100.244.46  <http://a2491ed93a79c4e4c9c3659352ba7ad5-1661450188.us-east-1.elb.amazonaws.com|a2491ed93a79c4e4c9c3659352ba7ad5-1661450188.us-east-1.elb.amazonaws.com>  8089:32312/TCP,88:31369/TCP,89:32734/TCP         176d
great-falls       flyte-pod-webhook                  ClusterIP   10.100.53.162  <none>                                  443/TCP                         176d
great-falls       flyteadmin                      LoadBalancer  10.100.205.238  <http://ab6eeae927cb148ea831f9a0482954f0-911285910.us-east-1.elb.amazonaws.com|ab6eeae927cb148ea831f9a0482954f0-911285910.us-east-1.elb.amazonaws.com>  80:31117/TCP,81:30601/TCP,87:30822/TCP,10254:31187/TCP  176d
great-falls       flyteconsole                     LoadBalancer  10.100.150.24  <http://a1f320958d4844bd081a46e0c3fd4485-1851160411.us-east-1.elb.amazonaws.com|a1f320958d4844bd081a46e0c3fd4485-1851160411.us-east-1.elb.amazonaws.com>  80:30722/TCP                       176d
kube-system       aws-load-balancer-webhook-service          ClusterIP   10.100.154.57  <none>                                  443/TCP                         177d
kube-system       kube-dns                       ClusterIP   10.100.0.10   <none>                                  53/UDP,53/TCP                      177d
monitoring       alertmanager-operated                ClusterIP   None       <none>                                  9093/TCP,9094/TCP,9094/UDP                177d
I assume nothing is reachable unless I'm in the VPN, but I don't really know.
f

Fabio Grätz

07/28/2023, 3:06 PM
<http://a1f320958d4844bd081a46xxxxxx485-1851160411.us-east-1.elb.amazonaws.com/console|a1f320958d4844bd081a46xxxxxx485-1851160411.us-east-1.elb.amazonaws.com/console>
is how you would try to reach the console?
s

Stephen Fromm

07/28/2023, 3:09 PM
f

Fabio Grätz

07/28/2023, 3:10 PM
and there is a dns record somwhere configured in aws that maps this domain to
<http://a1f320958d4844bd081a46e0c3fd4485-1851160411.us-east-1.elb.amazonaws.com|a1f320958d4844bd081a46e0c3fd4485-1851160411.us-east-1.elb.amazonaws.com>?
kubectl get ingress --all-namespaces
Does this give anything?
s

Stephen Fromm

07/28/2023, 3:13 PM
Copy code
$ kubectl get ingress --all-namespaces
NAMESPACE   NAME            CLASS  HOSTS                  ADDRESS                                      PORTS  AGE
great-falls  flyte-core         <none>  *                    <http://internal-k8s-flyte-4805ca007d-2146548039.us-east-1.elb.amazonaws.com|internal-k8s-flyte-4805ca007d-2146548039.us-east-1.elb.amazonaws.com>        80   176d
great-falls  flyte-core-grpc      <none>  *                    <http://internal-k8s-flyte-4805ca007d-2146548039.us-east-1.elb.amazonaws.com|internal-k8s-flyte-4805ca007d-2146548039.us-east-1.elb.amazonaws.com>        80   176d
monitoring  prometheus-stack-grafana  <none>  <http://grafana-great-falls.dev.embarkvet.com|grafana-great-falls.dev.embarkvet.com>  <http://internal-k8s-monitori-promethe-900828f478-1937690874.us-east-1.elb.amazonaws.com|internal-k8s-monitori-promethe-900828f478-1937690874.us-east-1.elb.amazonaws.com>  80   177d
f

Fabio Grätz

07/28/2023, 3:14 PM
Ok, and
kubectl -n great-falls describe ingress flyte-core-(grpc)
?
f

Fabio Grätz

07/28/2023, 3:14 PM
Makes sense
that is the external address of the ingress
s

Stephen Fromm

07/28/2023, 3:15 PM
Copy code
$ kubectl -n great-falls describe ingress flyte-core
Name:       flyte-core
Labels:      <http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
Namespace:    great-falls
Address:     <http://internal-k8s-flyte-4805ca007d-2146548039.us-east-1.elb.amazonaws.com|internal-k8s-flyte-4805ca007d-2146548039.us-east-1.elb.amazonaws.com>
Ingress Class:  <none>
Default backend: <default>
Rules:
 Host    Path Backends
 ----    ---- --------
 *      
       /*        ssl-redirect:use-annotation (<error: endpoints "ssl-redirect" not found>)
       /console     flyteconsole:80 ()
       /console/*    flyteconsole:80 ()
       /api       flyteadmin:80 ()
       /api/*      flyteadmin:80 ()
       /healthcheck   flyteadmin:80 ()
       /v1/*      flyteadmin:80 ()
       /.well-known   flyteadmin:80 ()
       /.well-known/*  flyteadmin:80 ()
       /login      flyteadmin:80 ()
       /login/*     flyteadmin:80 ()
       /logout     flyteadmin:80 ()
       /logout/*    flyteadmin:80 ()
       /callback    flyteadmin:80 ()
       /callback/*   flyteadmin:80 ()
       /me       flyteadmin:80 ()
       /config     flyteadmin:80 ()
       /config/*    flyteadmin:80 ()
       /oauth2     flyteadmin:80 ()
       /oauth2/*    flyteadmin:80 ()
Annotations: <http://alb.ingress.kubernetes.io/actions.ssl-redirect|alb.ingress.kubernetes.io/actions.ssl-redirect>:
        {"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}
       <http://alb.ingress.kubernetes.io/certificate-arn|alb.ingress.kubernetes.io/certificate-arn>: arn:aws:acm:us-east-1:11111111111111:certificate/[UUID-ish ID]
       <http://alb.ingress.kubernetes.io/group.name|alb.ingress.kubernetes.io/group.name>: flyte
       <http://alb.ingress.kubernetes.io/listen-ports|alb.ingress.kubernetes.io/listen-ports>: [{"HTTP": 80}, {"HTTPS":443}]
       <http://alb.ingress.kubernetes.io/scheme|alb.ingress.kubernetes.io/scheme>: internal
       <http://alb.ingress.kubernetes.io/tags|alb.ingress.kubernetes.io/tags>: service_instance=production
       <http://external-dns.alpha.kubernetes.io/hostname|external-dns.alpha.kubernetes.io/hostname>: <http://somethingsomething-great-falls.dev.companyname.com|somethingsomething-great-falls.dev.companyname.com>
       <http://kubernetes.io/ingress.class|kubernetes.io/ingress.class>: alb
       <http://meta.helm.sh/release-name|meta.helm.sh/release-name>: great-falls
       <http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: great-falls
       <http://nginx.ingress.kubernetes.io/app-root|nginx.ingress.kubernetes.io/app-root>: /console
Events:    <none>
Copy code
$ kubectl -n great-falls describe ingress flyte-core
Name:       flyte-core
Labels:      <http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
Namespace:    great-falls
Address:     <http://internal-k8s-flyte-4805ca007d-2146548039.us-east-1.elb.amazonaws.com|internal-k8s-flyte-4805ca007d-2146548039.us-east-1.elb.amazonaws.com>
Ingress Class:  <none>
Default backend: <default>
Rules:
 Host    Path Backends
 ----    ---- --------
 *      
       /*        ssl-redirect:use-annotation (<error: endpoints "ssl-redirect" not found>)
       /console     flyteconsole:80 ()
       /console/*    flyteconsole:80 ()
       /api       flyteadmin:80 ()
       /api/*      flyteadmin:80 ()
       /healthcheck   flyteadmin:80 ()
       /v1/*      flyteadmin:80 ()
       /.well-known   flyteadmin:80 ()
       /.well-known/*  flyteadmin:80 ()
       /login      flyteadmin:80 ()
       /login/*     flyteadmin:80 ()
       /logout     flyteadmin:80 ()
       /logout/*    flyteadmin:80 ()
       /callback    flyteadmin:80 ()
       /callback/*   flyteadmin:80 ()
       /me       flyteadmin:80 ()
       /config     flyteadmin:80 ()
       /config/*    flyteadmin:80 ()
       /oauth2     flyteadmin:80 ()
       /oauth2/*    flyteadmin:80 ()
Annotations: <http://alb.ingress.kubernetes.io/actions.ssl-redirect|alb.ingress.kubernetes.io/actions.ssl-redirect>:
        {"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}
       <http://alb.ingress.kubernetes.io/certificate-arn|alb.ingress.kubernetes.io/certificate-arn>: arn:aws:acm:us-east-1:1111111111:certificate/[UUID-ish ID]
       <http://alb.ingress.kubernetes.io/group.name|alb.ingress.kubernetes.io/group.name>: flyte
       <http://alb.ingress.kubernetes.io/listen-ports|alb.ingress.kubernetes.io/listen-ports>: [{"HTTP": 80}, {"HTTPS":443}]
       <http://alb.ingress.kubernetes.io/scheme|alb.ingress.kubernetes.io/scheme>: internal
       <http://alb.ingress.kubernetes.io/tags|alb.ingress.kubernetes.io/tags>: service_instance=production
       <http://external-dns.alpha.kubernetes.io/hostname|external-dns.alpha.kubernetes.io/hostname>: <http://somethingsomething-flyte-great-falls.dev.companyname.com|somethingsomething-flyte-great-falls.dev.companyname.com>
       <http://kubernetes.io/ingress.class|kubernetes.io/ingress.class>: alb
       <http://meta.helm.sh/release-name|meta.helm.sh/release-name>: great-falls
       <http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: great-falls
       <http://nginx.ingress.kubernetes.io/app-root|nginx.ingress.kubernetes.io/app-root>: /console
Events:    <none>
(base) ip-xx-xx-xx-xx:flyte_deployment sfromm$ kubectl -n great-falls describe ingress flyte-core-grpc
Name:       flyte-core-grpc
Labels:      <http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
Namespace:    great-falls
Address:     <http://internal-k8s-flyte-4805ca007d-2146548039.us-east-1.elb.amazonaws.com|internal-k8s-flyte-4805ca007d-2146548039.us-east-1.elb.amazonaws.com>
Ingress Class:  <none>
Default backend: <default>
Rules:
 Host    Path Backends
 ----    ---- --------
 *      
       /flyteidl.service.AdminService      flyteadmin:81 ()
       /flyteidl.service.AdminService/*     flyteadmin:81 ()
       /flyteidl.service.DataProxyService    flyteadmin:81 ()
       /flyteidl.service.DataProxyService/*   flyteadmin:81 ()
       /flyteidl.service.AuthMetadataService   flyteadmin:81 ()
       /flyteidl.service.AuthMetadataService/*  flyteadmin:81 ()
       /flyteidl.service.IdentityService     flyteadmin:81 ()
       /flyteidl.service.IdentityService/*    flyteadmin:81 ()
       /grpc.health.v1.Health          flyteadmin:81 ()
       /grpc.health.v1.Health/*         flyteadmin:81 ()
Annotations: <http://alb.ingress.kubernetes.io/actions.ssl-redirect|alb.ingress.kubernetes.io/actions.ssl-redirect>:
        {"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}
       <http://alb.ingress.kubernetes.io/backend-protocol-version|alb.ingress.kubernetes.io/backend-protocol-version>: HTTP2
       <http://alb.ingress.kubernetes.io/certificate-arn|alb.ingress.kubernetes.io/certificate-arn>: arn:aws:acm:us-east-1:11111111111:certificate/[UUID-ish ID]
       <http://alb.ingress.kubernetes.io/group.name|alb.ingress.kubernetes.io/group.name>: flyte
       <http://alb.ingress.kubernetes.io/listen-ports|alb.ingress.kubernetes.io/listen-ports>: [{"HTTP": 80}, {"HTTPS":443}]
       <http://alb.ingress.kubernetes.io/scheme|alb.ingress.kubernetes.io/scheme>: internal
       <http://alb.ingress.kubernetes.io/tags|alb.ingress.kubernetes.io/tags>: service_instance=production
       <http://external-dns.alpha.kubernetes.io/hostname|external-dns.alpha.kubernetes.io/hostname>: <http://somethingsomething-great-falls.dev.companyname.com|somethingsomething-great-falls.dev.companyname.com>
       <http://kubernetes.io/ingress.class|kubernetes.io/ingress.class>: alb
       <http://meta.helm.sh/release-name|meta.helm.sh/release-name>: great-falls
       <http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: great-falls
       <http://nginx.ingress.kubernetes.io/app-root|nginx.ingress.kubernetes.io/app-root>: /console
       <http://nginx.ingress.kubernetes.io/backend-protocol|nginx.ingress.kubernetes.io/backend-protocol>: GRPC
Events:    <none>
f

Fabio Grätz

07/28/2023, 3:16 PM
I’ve never done this with AWS but in GCP there is a page in the UI where one can see the health of the ingress
Can you pls check whether this is the case in aws?
s

Stephen Fromm

07/28/2023, 3:20 PM
I'll have to look at the docs. Might be away from keyboard for awhile for other reasons. It'd be great if you checked this thread later today, but if you're too busy I understand (since you're doing this for free!).
f

Fabio Grätz

07/28/2023, 3:21 PM
I’ll try to check tonight 🙂 5:30 pm here
s

Stephen Fromm

07/28/2023, 8:56 PM
Unfortunately I've done some looking and cannot find anything more about ingress health.
f

Fabio Grätz

07/31/2023, 4:44 PM
I unfortunately also don’t know where this is found in AWS. The fact that
Events:    <none>
at least on GCP would be a sign that the ingress is not doing so well. But can’t tell on AWS, sorry.
s

Stephen Fromm

07/31/2023, 4:46 PM
My colleagues and I looked at it more. It appears the pods were in the Ready state, but the nodes were in Not Ready. I would have thought that can't happen, though I did find a github issue (for k8s) where it happens, at least for older versions of k8s (and we're using an outdated version). So now we're cordoning and draining the nodes.
We figured out what happened. In an attempt to give me access to read the cluster state (using k9s, kubectl, etc), we added an identity mapping with me as the user, which shadowed the system user, which basically halted all communication within the network. Whoops.
f

Fabio Grätz

07/31/2023, 6:15 PM
Good to hear you figured it out 🙂
s

Stephen Fromm

07/31/2023, 6:16 PM
Sorry to have wasted so much of your time... 😢
f

Fabio Grätz

07/31/2023, 6:19 PM
No worries 🙂
6 Views