Hello everyone. We’re deploying flyte-binary on EK...
# flyte-deployment
Hello everyone. We’re deploying flyte-binary on EKS. Pod is running and connecting to the database. I am able to load the dashboard and trying to run a simple workflow, but I am getting the following error
Copy code
Failed with Exception Code: SYSTEM:Unknown
RPC Failed, with Status: StatusCode.INTERNAL
        details: failed to create a signed url. Error: WebIdentityErr: failed to retrieve credentials
caused by: RequestError: send request failed
caused by: Post <https://sts.us-east-2.amazonaws.com/>: dial tcp i/o timeout
        Debug string UNKNOWN:Error received from peer ipv4: {created_time:"2023-05-30T12:31:18.554380596+00:00", grpc_status:13, grpc_message:"failed to create a signed url. Error: WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \<https://sts.us-east-2.amazonaws.com/\>: dial tcp i/o timeout"}
Other pods on the same eks cluster, namespace and using the same service account are able to use awscli
can you try creating a link manually?
may need to spin up a awscli/python pod with the same credentials as flyte-binary.
haven’t seen this before. timeout is strange.
Yes, indeed. Let me try something
What did you mean by creating a link manually?
It's probably because your RDS and EKS are not in the same vpc
They actually are, but can you elaborate on your thought process?
like make a pod with awscli
and ssh into it, and call
aws get-signed-url
I am able to create a pre-signed url with AWS cli using the same service account, but still getting the same time out error above when calling pyflyte run
the timeout is coming from the admin side.
this is pretty weird. i’m not sure. this feels like just a networking issue. the go code in admin isn’t doing anything special.
if configured incorrectly it should get a 403 or a 401 or something, not a timeout
this feels like the flyte/admin pod isn’t able to talk to sts.