Hi folks - is there a quick sanity check that can ...
# flyte-deployment
e
Hi folks - is there a quick sanity check that can be used to verify that a control plane and data plane are configured properly and talking to one another? (Without actually scheduling a workflow to run that is)
d
@Ethan Brown I usually check the status of the
sync-resources
Pod. If, after configuring everything, that Pod is
Running
, that's a good sign
e
I'm digging in a little bit further now and I actually see some rpc errors getting logged in
flytepropeller
inside the data plane. Jobs look to be sent over to the data plane properly, but it looks to me like traffic isn't properly flowing out of that cluster over GRPC back to the control plane. Double-checking my ingress definition now (and my ingress logs)
Copy code
{"json":{"exec_id":"f194818b7dbea4957858","ns":"flytesnacks-development","res_ver":"10430815","routine":"worker-1","wf":"flytesnacks:development:.flytegen.basic-task.slope"},"level":"warning","msg":"Event recording failed. Error [EventSinkError: Error sending event, caused by [rpc error: code = Unavailable desc = connection error: desc = \"error reading server preface: http2: frame too large\"]]","ts":"2023-11-29T15:41:06Z"}

{"json":{"exec_id":"f194818b7dbea4957858","ns":"flytesnacks-development","res_ver":"10430815","routine":"worker-1","wf":"flytesnacks:development:.flytegen.basic-task.slope"},"level":"error","msg":"Error when trying to reconcile workflow. Error [[]]. Error Type[*errors.WorkflowErrorWithCause]","ts":"2023-11-29T15:41:06Z"}
d
what Ingress controller are you using?
e
nginx-ingress (unfortunately ;))
d
on EKS?
e
Yup!
d
from a reference implementation, these are the annotations used:
Copy code
common:
  ingress:
    host: "{{ .Values.userSettings.hostName }}"
    tls:
      enabled: true
      secretName: flyte-secret-tls
    annotations:
      <http://kubernetes.io/ingress.class|kubernetes.io/ingress.class>: nginx
      <http://ingress.kubernetes.io/rewrite-target|ingress.kubernetes.io/rewrite-target>: /
      <http://nginx.ingress.kubernetes.io/ssl-redirect|nginx.ingress.kubernetes.io/ssl-redirect>: "true"
      <http://cert-manager.io/issuer|cert-manager.io/issuer>: "letsencrypt-production"
      <http://acme.cert-manager.io/http01-edit-in-place|acme.cert-manager.io/http01-edit-in-place>: "true"
    # --- separateGrpcIngress puts GRPC routes into a separate ingress if true. Required for certain ingress controllers like nginx.
    separateGrpcIngress: true
    # --- Extra Ingress annotations applied only to the GRPC ingress. Only makes sense if `separateGrpcIngress` is enabled.
    separateGrpcIngressAnnotations:
      <http://nginx.ingress.kubernetes.io/backend-protocol|nginx.ingress.kubernetes.io/backend-protocol>: "GRPC"
You can remove the cert-manager related content
e
Yeah, I should have mentioned I'm already using the separate ingress definition with the GRPC annotation
Just for sanity, the grpc endpoint config for the remote data plane should be like this, yeah?
Copy code
configmap:
  admin:
    admin:
      endpoint: publichost.domain.com:443
      insecure: false
  catalog:
    catalog-cache:
      endpoint: publichost.domain.com:443
      insecure: false
Are there particular services that I could call using
grpcurl
from outside the ingress to validate things?
d
there should be a flyteadmin Service exposing the 8089 port if I remember correctly
e
From inside the cluster, an easy way to test connectivity to the service (over plaintext) is with:
grpcurl -plaintext flyteadmin.flyte:81 list
Then it's easy to kind of move outward from there to find the point at which the service becomes unreachable