https://flyte.org logo
#ask-the-community
Title
# ask-the-community
p

Panos Strouth

10/25/2022, 9:27 AM
Hi everyone, I am new to K8S and Flyte but I managed to install Flyte on EKS by following this guide: https://docs.flyte.org/en/latest/deployment/aws/manual.html I tried to access flyte using flytectl and it worked. Unfortunately, when I try to use pyflyte to execute a workflow remotely I get the following error:
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNKNOWN
details = "failed to create a signed url. Error: WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
status code: 403, request id: 88d09420-d2e3-4772-8767-83cff32d91af"
debug_error_string = "UNKNOWN:Error received from peer ipv4:xx.xx.xx.xx:443 {grpc_message:"failed to create a signed url. Error: WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403
Seems like an error in IRSA (IAM Role for ServiceAccount). The installation guide suggests to attach IAM roles to the whole EC2 node. Personally I decided to use IRSA because I think this is the correct way to provide permissions to applications. Using EC2-wide roles means that every application running on the instance has the role permissions. With IRSA you allow IAM roles be assumed by applications running in specific namespaces…some kind of more fine-grained control. But as I said I am still a K8S beginner so no strong opinion. My IAM setup has 2 roles: flyte-user-role and iam-role-flyte. Both roles have full s3 permissions. The most important part is the trust policy. Since I use IRSA both roles have the following trust policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::xxxxxxxx:oidc-provider/oidc.eks.eu-central-1.amazonaws.com/id/yyyyyy"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"<http://oidc.eks.eu-central-1.amazonaws.com/id/yyyyyy:aud|oidc.eks.eu-central-1.amazonaws.com/id/yyyyyy:aud>": "<http://sts.amazonaws.com|sts.amazonaws.com>",
"<http://oidc.eks.eu-central-1.amazonaws.com/id/yyyyyy:sub|oidc.eks.eu-central-1.amazonaws.com/id/yyyyyy:sub>": "system:serviceaccount:flyte:default"
}
}
}
]
}
Note the “flyte” namespace in the Condition. My flyte services run in “flyte” namespace and they should be able to assume the above roles. I think the problem is related to IAM trust policies because flyte service does not have the required permissions to assume the IAM role. Has anyone faced a similar issue? Any help is appreciated!
r

Rahul Mehta

10/25/2022, 6:47 PM
We're also using IRSA + EKS, and have only been manually creating the IAM policy in the console and using
eksctl create iamserviceaccounts
to provision them. This should take care of setting up the trust relationship etc via a cloudformation stack/might be less error-prone than manually doing it.
Is it possible for you to use
eksctl
here? Seems like you're already using EKS
p

Panos Strouth

10/25/2022, 7:21 PM
Thanks for your reply @Rahul Mehta I deployed my EKS cluster using terraform. There is a popular terraform module for EKS deployment:
terraform-aws-modules/eks/aws
(So far, there is no official terraform module for EKS) The creation of IRSA is also automated in my implementation using the following terraform module:
terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks
I have deployed multiple EKS plugins successfully using this module. Nothing manual...everything is done via terraform.
Would be easy for someone to tell me what is the output of the command:
kubectl get sa -n flyte
given that flyte is installed in a namespace called "flyte" For me the output is:
NAME              SECRETS  AGE
datacatalog         1     13d
default             1     13d
flyte-pod-webhook   1     13d
flyteadmin          1     13d
flytepropeller      1     13d
flytescheduler      1     13d
I also noticed that I have the following namespaces created in my EKS cluster:
default                    Active  14d
flyte                      Active  13d
flyteexamples-development  Active  13d
flyteexamples-production   Active  13d
flyteexamples-staging      Active  13d
flytesnacks-development    Active  13d
flytesnacks-production     Active  13d
flytesnacks-staging        Active  13d
flytetester-development    Active  13d
flytetester-production     Active  13d
flytetester-staging        Active  13d
kube-node-lease            Active  14d
kube-public                Active  14d
kube-system                Active  14d
But only the "flyte" namespace is authorized to assume the IAM roles. Do I have to authorize all flyte related namespaces to assume the roles? I am just trying to run the following command remotely:
pyflyte run --remote example.py wf --n 500 --mean 42 --sigma 2
Problem solved. When someone uses IRSA he should have in mind that every Flyte project has its own K8S namespace. This means that every time you create a new project you should explicitly add the new project namespace to IAM role trust policy. In my case the trust policy was:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::xxxxxxx:oidc-provider/oidc.eks.eu-central-1.amazonaws.com/id/xxxxxx"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"<http://oidc.eks.eu-central-1.amazonaws.com/id/xxxxxx:sub|oidc.eks.eu-central-1.amazonaws.com/id/xxxxxx:sub>": [
"system:serviceaccount:flyte:default"
],
"<http://oidc.eks.eu-central-1.amazonaws.com/id/xxxxx:aud|oidc.eks.eu-central-1.amazonaws.com/id/xxxxx:aud>": "<http://sts.amazonaws.com|sts.amazonaws.com>"
}
}
}
]
}
Note that I have a Condition in the trust policy. The Condition allows only the "flyte" namespace pods to assume the role. If you want all your projects (and their namespaces) to assume the role then you do not need the Condition!
Maybe this is the reason why the documentation suggests Node-level IAM roles instead of IRSA.
r

Rahul Mehta

10/26/2022, 5:29 PM
Ah got it - we've got some automation we use to set this up with IRSA currently, but it still requires taking a manual action whenever we decide to add a new project. We figured that'd be pretty infrequent after initial setup, and those steps could be well-documented for our infra team to handle
200 Views