I was previously able to access the console and ex...
# ask-the-community
c
I was previously able to access the console and execute workflows on a AWS eks deployment. I had to redeploy yesterday, and I am able to access the console but
pyflyte
executions are failing.
Copy code
Failed with Exception Code: SYSTEM:Unknown
RPC Failed, with Status: StatusCode.INTERNAL
	details: failed to create a signed url. Error: WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
	status code: 403, request id: 5efc9c88-fdcb-42ab-bea8-8de7a79101e9
	Debug string UNKNOWN:Error received from peer  {grpc_message:"failed to create a signed url. Error: WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: 5efc9c88-fdcb-42ab-bea8-8de7a79101e9", grpc_status:13, created_time:"2023-06-14T11:16:05.730571-07:00"}
d
hi Cody So, is this without auth right? Let's check if everything is fine with IRSA
what's the output of
aws iam get-role --role-name flyte-system-role --query Role.AssumeRolePolicyDocument
also
kubectl describe sa flyte-backend-flyte-binary -n flyte
c
Copy code
An error occurred (NoSuchEntity) when calling the GetRole operation: The role with name flyte-system-role cannot be found.
Copy code
Name:                flyte-backend-flyte-binary
Namespace:           flyte
Labels:              <http://app.kubernetes.io/instance=flyte-backend|app.kubernetes.io/instance=flyte-backend>
                     <http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
                     <http://app.kubernetes.io/name=flyte-binary|app.kubernetes.io/name=flyte-binary>
                     <http://app.kubernetes.io/version=1.16.0|app.kubernetes.io/version=1.16.0>
                     <http://helm.sh/chart=flyte-binary-v1.6.2|helm.sh/chart=flyte-binary-v1.6.2>
Annotations:         <http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: arn:aws:iam::xxxxxx:role/eksctl-flyte-cluster-cluster-ServiceRole-1M4CS3AC5LGV8
                     <http://meta.helm.sh/release-name|meta.helm.sh/release-name>: flyte-backend
                     <http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: flyte
Image pull secrets:  <none>
Mountable secrets:   <none>
Tokens:              <none>
Events:              <none>
d
ok, then
aws iam get-role --role-name eksctl-flyte-cluster-cluster-ServiceRole-1M4CS3AC5LGV8  --query Role.AssumeRolePolicyDocument
c
Copy code
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "<http://eks.amazonaws.com|eks.amazonaws.com>"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
d
hm, that's a problem. There's no trust relationship apparently
c
Copy code
2023-06-14 11:36:30 [ℹ]  IAM Open ID Connect provider is already associated with cluster "flyte-cluster" in "us-west-2"
It looks alright in the console, as far as I can tell.
d
How did you create the role?
c
I created it a few months ago with the original deployment.. I believe according to the-hard-way docs
d
ok, can your try to run steps 1-3 in this section? You can use a different role name, but if the trust relationship is right then we can update the Helm chart and upgrade the deployment
c
Great. I've created the new role. Now update the values ?
Copy code
serviceAccount:
  create: true
  annotations:
    <http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: "arn:aws:iam::xxxxxxx:role/flyte-system-role"
d
but first pls verify that it has the proper trust relationship
like
aws iam get-role --role-name <your-role-name> --query Role.AssumeRolePolicyDocument
c
Copy code
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::xxxxxxxx:oidc-provider/oidc.eks.us-west-2.amazonaws.com/id/<hash>"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "<http://oidc.eks.us-west-2.amazonaws.com/id/<hash>:aud|oidc.eks.us-west-2.amazonaws.com/id/<hash>:aud>": "<http://sts.amazonaws.com|sts.amazonaws.com>",
          "<http://oidc.eks.us-west-2.amazonaws.com/id/<hash>:sub|oidc.eks.us-west-2.amazonaws.com/id/<hash>:sub>": "system:serviceaccount:flyte:flyte-backend-flyte-binary"
        }
      }
    }
  ]
}
Same error after helm upgrade. (403 error grpc failed)
d
ok, and you're not using
auth
right?
c
Right
d
what are the contents of
$HOME/.flyte/config.yaml
c
Copy code
admin:
  # For GRPC endpoints you might want to use dns:///flyte.myexample.com
  endpoint: dns:///<domain-name>.com
  authType: Pkce
  insecure: false
logger:
  show-source: true
  level: 0
We have our custom domain name
I could start completely fresh I suppose.
Got it. Thanks for the help @David Espejo (he/him)!
d
so, is it working?
e
@Cody Scandore how did you get it working? I'm currently having the same issue 😕
c
In my particular case it was related to IRSA, as David mentioned. I was deploying the flyte-binary. Previously I was not using the
customData.<env>.defaultIamRole
field from the values.yaml and the executions were still working fine. I needed to uncomment those fields and redeploy.
Copy code
cluster_resources:
      customData:
#        - production:
#            - defaultIamRole:
#                value: arn:aws:iam::<AWS-ACCOUNT-ID>:role/flyte-system-role
#        - staging:
#            - defaultIamRole:
#                value: arn:aws:iam::<AWS-ACCOUNT-ID>:role/flyte-system-role
#        - development:
#            - defaultIamRole:
#                value: arn:aws:iam::<AWS-ACCOUNT-ID>:role/flyte-system-role
It's probably worth going through the "steps 1-3" in both sections of David's guide, whether or not your issue is the same.
e
these are under your inline key in your
values.yaml
?
c
Yes,
inline.cluster_resources
189 Views