Hi, I've tried to deploy flyte(-core) into AWS inf...
# flyte-support
m
Hi, I've tried to deploy flyte(-core) into AWS infrastructure (EKS) with https://github.com/unionai-oss/deploy-flyte ... I do not have ingress properly set up, so I'd like to rely on port-forwardning. However I have no idea which service to port-forward for console (is it svc/flyteconsole? That only shows empty project page and no other info) and what should I port-forward for my ~/.flyte/config.yaml endpoint... I have tried
kubectl port-forward -n flyte svc/flyteadmin 30080:81
, now
pyflyte info
actually shows some information so it seems like it connects to something. But when I run the code example from https://github.com/unionai-oss/deploy-flyte/blob/main/environments/aws/flyte-core/README.md using
pyflyte run --remote hello_world.py my_wf
it tells me Running Execution on Remote. 00000 Running execution on remote. [✔️] Go to http://localhost:30080/console/projects/flytesnacks/domains/development/executions/afr2zbgj2bkp5lfrv9sr to see execution in the console. Of course there's nothing on that page, and my localhost:8080 (where I have svc/flyteconsole:
kubectl port-forward -n flyte svc/flyteconsole 8080:80
) does not show any project info either
a
Hey @millions-plastic-44322 could you share the output of
Copy code
kubectl get svc -n flyte
Flyte works fine behind port-forwarding (just no SSL) but there are two endpoints to expose: gRPC (what you already did) and http (what can get you to the console)
actually there's an issue and a fix the community contributed here Sorry about it, let me port it to the deploy-flyte repo so maybe it's just a matter of doing a
helm upgrade
afterwards
m
Copy code
~$ kubectl get svc -n flyte
NAME                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                          AGE
datacatalog         NodePort    10.255.119.30    <none>        88:31060/TCP,89:32534/TCP        30h
flyte-pod-webhook   ClusterIP   10.255.162.192   <none>        443/TCP                          30h
flyteadmin          ClusterIP   10.255.141.120   <none>        80/TCP,81/TCP,87/TCP,10254/TCP   30h
flyteconsole        ClusterIP   10.255.35.228    <none>        80/TCP                           30h
ah, interesting, let me check if I can adapt my
values-eks-core.yaml
with this...
a
it didn't work for me. I'll try a few more things In the meantime, any chance you can try getting maybe an nginx controller running in your cluster? I think the Terraform code should just create the Ingress resource and the controller would reconcille. It's fine not to have a domain name, you can just use nip.io or similar but that takes out of the port-forward tunneling
m
I think EKS requires alb as ingress, and I'm almost able to get it work with additional "alb.ingress.kubernetes.io/healthcheck-path: "/healthcheck"" annotation However this only fixes HTTP routes, GRPC is still shown as Unhealthy: Health checks failed when looking at EC2-> Load balancers in AWS console
actually I think the grpc ingress works only with ssl setup? I tried to remove ssl setup and get complains about missing certificate for a host (of course) but if I try to keep HTTP only, I get this complain from grpc ingress: Listener protocol 'HTTP' is not supported with a target group with the protocol-version 'GRPC'
FYI, I was able to get HTTP ingress working (without SSL) and now I even have the console content. Which is interesting, it seems it really was just about the right connection. (It required some changes to the values file, I might offer PR later) However that grpc ingress does not work without https, so I have to rely on port-forwarding
a
sorry, coming back to this. Yah, I forgot to mention that you could use NGINX but would have to resort to different Ingress annotations.
the python gRPC client doesn't like self signed certs, but it can work. How's does your local
config.yaml
look like?
m
I have only default value of
Copy code
admin:
  endpoint: dns:///localhost:30080
  insecure: true
while port-forwarding
svc/flyteadmin 30080:81
I think I'll try to use some self-signed cert to get grpc ingress working. Or switch to nginx, but I am not sure if EKS does not have some problem with it. Anyway my main problem (not being able to see console content) seems to be fixed for now so we can experiment with flyte finally...
Do I need AWS_* variables in the config.yaml file? I'm getting
flytekit.exceptions.system.FlyteDownloadDataException: SYSTEM:DownloadDataError: error=Failed to get data from <s3://mle-flyte-eks-dev/jiri-test/development/CKP5CYSUJ3EN6RPPQL2DI6I3MI======/fast1ddd21a0646d65740ab90568b7b0c46c.tar.gz> to ./ (recursive=False).
error in the pod where the workflow is running. Code was correctly uploaded to S3 (the path in bucket exists), but now I'm getting this download error. Is it something missing in deploy-flyte/aws setup?