Hi, Im running into an issue (404 Not Found) on EK...
# ask-the-community
g
Hi, Im running into an issue (404 Not Found) on EKS deployment where Flyteconsole cant find Flyteadmin. Its trying to access resources at
{FLYTECONSOLE_ENDPOINT}/projects
that are not there, because they are at
{FLYTEADMIN_ENDPOINT}/projects
Did I misconfigure something? How do I tell Flyteconsole to not look at its own base url for that?
Copy code
❯ k get svc -n flyte
NAME                TYPE           CLUSTER-IP       EXTERNAL-IP                                                                PORT(S)                                                  AGE
...
flyteadmin          LoadBalancer   172.20.236.182   {FLYTEADMIN_ENDPOINT}.<http://elb.amazonaws.com|elb.amazonaws.com>   80:30338/TCP,81:32698/TCP,87:30451/TCP,10254:30759/TCP   29h
flyteconsole        LoadBalancer   172.20.137.85    {FLYTECONSOLE_ENDPOINT}.<http://elb.amazonaws.com|elb.amazonaws.com>    80:30048/TCP                                             29h 
...
I think I fixed it by setting
ADMIN_API_URL
in the flyte-console-config configmap, but Im not sure if that is the correct solution, as its not mentioned anywhere in the deployment walkthrough
Now running into this when trying to register a workflow agains the remote deployment:
Copy code
Failed with Exception Code: SYSTEM:Unknown
RPC Failed, with Status: StatusCode.INTERNAL
        details: failed to create a signed url. Error: WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
        status code: 403, request id: f9376664-606d-441d-8b49-3534a6b368cb
        Debug string UNKNOWN:Error received from peer ipv4:{FLYTEADMIN_IP_ADDRESS}:81 {grpc_message:"failed to create a signed url. Error: WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: f9376664-606d-441d-8b49-3534a6b368cb", grpc_status:13, created_time:"2023-07-20T17:13:55.233969+02:00"}
My `~/.flyte/config.yaml`:
Copy code
admin:
  # For GRPC endpoints you might want to use dns:///flyte.myexample.com
  endpoint: dns:///{FLYTEADMIN_ENDPOINT}.<http://elb.amazonaws.com:81|elb.amazonaws.com:81>
  insecure: true
  insecureSkipVerify: true
logger:
  show-source: true
  level: 0
where
81
is the gRPC port exposed by the flyteadmin LoadBalancer service:
Copy code
❯ k get svc -n flyte flyteadmin -o yaml
apiVersion: v1
kind: Service
...
spec:
  ...
  ports:
  - name: http
    nodePort: ...
    port: 80
    protocol: TCP
    targetPort: 8088
  - name: grpc
    nodePort: ...
    port: 81
    protocol: TCP
    targetPort: 8089
  selector:
    <http://app.kubernetes.io/instance|app.kubernetes.io/instance>: flyte-core
    <http://app.kubernetes.io/name|app.kubernetes.io/name>: flyteadmin
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer:
    ingress:
    - hostname: {FLYTEADMIN_ENDPOINT}.<http://elb.amazonaws.com|elb.amazonaws.com>
c
cc @Jason Porter @Yee we should update the docs
y
@Geert which docs were you following?
g
Im using the Helm chart and a reference setup in my company. Not using ingress at the moment for this, I assumed it should work with the individual Services. Is that not the case?
Also w/o auth to try to start with a minimal setup firsst
y
which helm chart? fltye-core?
looking at your secrets change as well today
(ty btw)
g
Yeah
flyte-core
Will fix the merge conflict later today as well
y
i would use the flyte-binary helm chart if that’s easy to change.
if not that’s okay. both are supported but the one binary version is newer and in a cleaner state
g
Alright I'll try that, but to understand, I would not need the Ingress set up to get it working right?
And I should use the gRPC endpoint of flyteadmin in my flyte config.yaml
y
if you’re willing to port-forward things locally then yeah you never need ingress
but that’s not very fun to use
g
No without port-forwarding
The flyteadmin, flyteconsole, datacatlog components are exposed via Service of type LoadBalancer
y
and they get public IPs?
g
Yeah
They all seem to be working fine
Its just how to connect pyflyte to register
y
i actually haven’t used that before… in the past i’ve only ever used the alb ingress controller, which creates the load balancers
g
Hmm right, maybe I should just fix that then 😄
Keep you posted! Thanks for the insights!
Oh final question for now if you have time:
And I should use the gRPC endpoint of flyteadmin in my flyte config.yaml
y
yes
j
@Geert: if you dont want to use ALB, you can probably just run your own reverse proxy like nginx or envoy. we do that in sandbox - but its more work than you should need. the ingresses are already bundled in the chart - usable with nginx ingress controller, kong, traefik, ALB, etc.
Probably cheaper to run a single LB with ingress than it is to run one for each component too
g
I see, thanks for the pointers @jeev! Ill get on it in the morning 🙂
I couldnt resist to try it already, with the
flyte-binary
chart Pod comes up and I can view the UI, but on registering a workflow I get this error:
Copy code
Failed with Exception Code: SYSTEM:Unknown
RPC Failed, with Status: StatusCode.UNAVAILABLE
        details: failed to connect to all addresses; last error: INTERNAL: ipv4:{INGRESS_IP}:80: Trying to connect an http1.x server
        Debug string UNKNOWN:failed to connect to all addresses; last error: INTERNAL: ipv4:{INGRESS_IP}:80: Trying to connect an http1.x server {grpc_status:14, created_time:"2023-07-20T21:19:54.817303+02:00"}
Using this config (tried a couple of variations for the endpoint but no luck):
Copy code
admin:
  # For GRPC endpoints you might want to use dns:///flyte.myexample.com
  endpoint: dns:///{flyte_ingress_host}:80
  authType: Pkce
  insecure: true
  insecureSkipVerify: true
logger:
  show-source: true
  level: 0
h
In flyte-core console seems to picking up the wrong env var
Copy code
Please open <http://undefined:8080/console>
if I add
Copy code
configmap:

  # -- Configuration for Flyte console UI
  console:
    BASE_URL: /console
    CONFIG_DIR: /etc/flyte/config
    ADMIN_API_URL: flyteadmin
then console log shows:
Copy code
Please open <http://flyteadmin:8080/console>
y
even though it’s insecure true it’s still grpc under the hood. so you will need an http/2 network.
check to make sure that the alpn is set correctly. and this is a tcp elb right?
h
in flyteadmin I have:
Copy code
service:
    annotations:
      # Required for the ingress to properly route grpc traffic to grpc port
      <http://cloud.google.com/app-protocols|cloud.google.com/app-protocols>: '{"grpc":"HTTP2"}'
y
in flyte console you will need to tell it how to talk to admin.
check the actual thing
the load balancer in the gcp console
h
flyteadmin has a clusterIp and service as follows:
Copy code
apiVersion: v1
kind: Service
metadata:
  annotations:
    <http://cloud.google.com/app-protocols|cloud.google.com/app-protocols>: '{"grpc":"HTTP2"}'
    <http://meta.helm.sh/release-name|meta.helm.sh/release-name>: flyte
    <http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: flyte
    <http://projectcontour.io/upstream-protocol.h2c|projectcontour.io/upstream-protocol.h2c>: grpc
  creationTimestamp: "2023-07-17T21:11:12Z"
  labels:
    <http://app.kubernetes.io/instance|app.kubernetes.io/instance>: flyte
    <http://app.kubernetes.io/managed-by|app.kubernetes.io/managed-by>: Helm
    <http://app.kubernetes.io/name|app.kubernetes.io/name>: flyteadmin
    <http://helm.sh/chart|helm.sh/chart>: flyte-core-v1.8.0
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:<http://cloud.google.com/app-protocols|cloud.google.com/app-protocols>: {}
          f:<http://meta.helm.sh/release-name|meta.helm.sh/release-name>: {}
          f:<http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: {}
          f:<http://projectcontour.io/upstream-protocol.h2c|projectcontour.io/upstream-protocol.h2c>: {}
        f:labels:
          .: {}
          f:<http://app.kubernetes.io/instance|app.kubernetes.io/instance>: {}
          f:<http://app.kubernetes.io/managed-by|app.kubernetes.io/managed-by>: {}
          f:<http://app.kubernetes.io/name|app.kubernetes.io/name>: {}
          f:<http://helm.sh/chart|helm.sh/chart>: {}
      f:spec:
        f:externalTrafficPolicy: {}
        f:internalTrafficPolicy: {}
        f:ports:
          .: {}
          k:{"port":80,"protocol":"TCP"}:
            .: {}
            f:name: {}
            f:port: {}
            f:protocol: {}
            f:targetPort: {}
          k:{"port":81,"protocol":"TCP"}:
            .: {}
            f:name: {}
            f:port: {}
            f:protocol: {}
            f:targetPort: {}
          k:{"port":87,"protocol":"TCP"}:
            .: {}
            f:name: {}
            f:port: {}
            f:protocol: {}
            f:targetPort: {}
          k:{"port":10254,"protocol":"TCP"}:
            .: {}
            f:name: {}
            f:port: {}
            f:protocol: {}
            f:targetPort: {}
        f:selector: {}
        f:sessionAffinity: {}
        f:type: {}
    manager: terraform-provider-helm_v2.10.1_x5
    operation: Update
    time: "2023-07-20T19:16:28Z"
  name: flyteadmin
  namespace: flyte
  resourceVersion: "336301862"
  uid: 1d565b87-57b1-4eb1-9bd5-dfc1ddc21e32
spec:
  clusterIP: 10.182.10.251
  clusterIPs:
  - 10.182.10.251
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 8088
  - name: grpc
    port: 81
    protocol: TCP
    targetPort: 8089
  - name: redoc
    port: 87
    protocol: TCP
    targetPort: 8087
  - name: http-metrics
    port: 10254
    protocol: TCP
    targetPort: 10254
  selector:
    <http://app.kubernetes.io/instance|app.kubernetes.io/instance>: flyte
    <http://app.kubernetes.io/name|app.kubernetes.io/name>: flyteadmin
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}
other components seems to be ok connecting to it
y
mmm
not sure. generally when i’m debugging network stuff i try in layers a - from within the flyte-binary pod itself, if you install curl, can you curl localhost:port/api/v1/projects b - from a dummy pod running stock ubuntu image or something with curl installed, in the flyte namespace, can you hit api/v1/projects with the addresses you think it should be accessible on. c - from outside the cluster but port-forwarding the service (not the pod) can you curl api/v1/project on the addresses you think it should be accessible on. d - last step is the ingress/lb.
i’m assuming a-c works?
h
I dont see flyte-binary being part of flyte-core chart
datacatalog flyte-pod-webhook flyteadmin flyteconsole flytepropeller flytescheduler syncresources
these are the components I have with flyte-core
g
I think we are mixing 2 threads here
@Yee my nginx ingress controller has
use-http2: "true"
set, is there something else to check?
Its an NLB (layer 4), gRPC is over TCP right
Is ALB required for this?
y
not required. tcp elb is good
forgot what that’s called but yeah l4
can you check the aws console for the lb that’s created?
just to make sure the dance was done correctly
g
I found the issue, FlyteSystemRole trust relation was not set up properly as is described here: https://docs.flyte.org/en/v1.3.0/deployment/aws/manual.html#oidc-provider-for-the-eks-cluster
Its a bit tricky to find, since I dont see that EKS manual setup in the latest docs
d
@Geert if you're on EKS already, try following this guide: https://github.com/davidmirror-ops/flyte-the-hard-way/tree/main