Hey! Do anyone know if it is possible/where to se...
# ask-the-community
r
Hey! Do anyone know if it is possible/where to set session duration on the role which is assumed when running a workflow? I have a task which is running for some time (~5min) and then i get this error:
Copy code
flytekit.exceptions.scopes.FlyteScopedUserException: An error occurred (ExpiredToken) when calling the AssumeRole operation: The security token included in the request is expired
and after a while it also prints this error:
Copy code
Called process exited with error code: 1.  Stderr dump:\n\nb'upload failed: ../../tmp/flyte-kvw3xxto/sandbox/local_flytekit/engine_dir/error.pb to s3://<s3_bucket>/metadata/propeller/<project_name>-development-<execution_id>/n1/data/0/error.pb An error occurred (AccessDenied) when calling the PutObject operation: Access Denied
I trigger this workflow with a Launch Plan which specify a specific service account (not default). It was created like this:
Copy code
security_context = SecurityContext(
        run_as=Identity(
            iam_role=None,
            k8s_service_account=f"my-aws-role",
        ),
    )

LaunchPlan.get_or_create(
    name="my_lp",
    workflow=my_wf,
    security_context=security_context,
)
Anyone who faced the same error before?
s
@Prafulla Mahindrakar, do you know how to set the session duration?
@Robin Eklund
Also can you verify aswell the assumed IAM role has the PutObject permissions. You can verify this by checking the service account annotated role.
Copy code
kubectl get sa -n /<project_name>-development my-aws-role -o yaml
r
@Samhita Alla and @Prafulla Mahindrakar thanks for your response! in the article it says default session duration is one hour - but i get this error after ~5min 🤔 I ran the command to check permissions, this is what i get:
Copy code
apiVersion: v1
automountServiceAccountToken: true
kind: ServiceAccount
metadata:
  annotations:
    <http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: arn:aws:iam::<aws_account_id>:role/eks/MyAwsRole
  creationTimestamp: "2022-10-28T11:24:50Z"
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:automountServiceAccountToken: {}
      f:metadata:
        f:annotations:
          .: {}
          f:<http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: {}
    manager: HashiCorp
    operation: Update
    time: "2022-10-28T11:24:50Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:secrets:
        .: {}
        k:{"name":"my-aws-role-token-pvz2l"}: {}
    manager: kube-controller-manager
    operation: Update
    time: "2022-10-28T11:24:50Z"
  name: my-aws-role
  namespace: <project_name>-development
  resourceVersion: "20306065"
  uid: <UUID>
secrets:
- name: my-flyte-role-token-pvz2l
Does it look OK for you, or what should i verify here? I think the error about the put request comes due to the session token expires?
based on how the role is created these are the permissions:
Copy code
s3:GetBucketLocation
s3:GetObject
s3:ListBucket
s3:ListBucketMultipartUploads
s3:ListMultipartUploadParts
s3:AbortMultipartUpload
s3:PutObject
s3:DeleteObject
s3:ListAllMyBuckets
also verified the role has 1 hour max session duration:
p
That looks ok. Is there a log to indicate that an upload had succeeded before the timeout. This can also be verified if the bucket where you are uploading has any objects. If there are not objects then we can check more from permissions side Another thing to check is the role infact being used by the flyte-pods. It should ideally be the same you have mentioned in launch but would be good to check. You can get this by doing
kubectl get pod -n <project_name>-development <execuitid>-* -o yaml
and check serviceAccount field. Or else we have dig more into why there a reduced session token being given from STS.
Also the role that you have mentioned , are these for all buckets in your account
r
there is lots of subfolders inside that bucket - so i guess the put permission is correct. When i run shorter tasks, < 5min, i don't get this error
i have verified this before with running
aws sts get-caller-identity
inside a task - and the right role is being used
the command you sent didn't find the execution - but ran it without the filter and then i found it - and yes it is using the correct role
i assume i should check the
AWS_ROLE_ARN
value?
p
thats strange , yes the AWS_ROLE_ARN that you have configured for accessing the bucket .
For < 5min duration , you do see those executions here
s3://<s3_bucket>/metadata/propeller/
and only for >5 min it fails to upload
r
Let me double check the metadata folder
so there is lots of subfolders with different executions under /metadata/propeller, but not the one which failed due to this error
p
Ok then it definitely an issue with long running executions. We haven’t seen this in our community AFAIK . Are you using non-default IDP in your EKS cluster .
c c: @Haytham Abuelfutuh
r
We use Flyte IDP for flytctl and all other internal services. Service Accounts are connected to IAM with OIDC.
btw. @Prafulla Mahindrakar if a call would make debugging easier i would be up for that. And thanks again for taking your time!
@Prafulla Mahindrakar @Haytham Abuelfutuh we had some logging issues (not related to flyte) - which were the confusing part. Actually the job was running for 1 hour before we got this error - so we will just increase the session duration on the role within AWS. Thanks for taking your time 🙏
a
On EKS service account it's possible to set session duration via annotation: https://github.com/aws/amazon-eks-pod-identity-webhook if your role is configured with correct max session duration as well
p
Thanks for sharing this @Andrew Korzhuev, this can then probably be synced through the cluster resource controller if enabled along with the role annotations on the service account.
101 Views