• Hampus Rosvall

    Hampus Rosvall

    3 months ago
    Hey, what is the preferred way of providing project specific IAM Roles to a workflow? I am trying to run a workflow using an IAM role with permissions suitable for only that project. In my launch plan I enter some role e.g., as in the picture below but it doesn’t seem to be picked up when I describe the Pod. Does your default role e.g.,
    eks-flyte-user-rule
    assume the IAM Role you provide in the launch plan, or is it assumed by some service account that gets created?
    Environment:
          ...
          AWS_ROLE_ARN:                       arn:aws:iam::<account_id>:role/eks-flyte-user-role
  • So basically the Flyte User Role ends up being assumed by default. Should I create a new role that the flyte user role will assume with the correct permissions, or do you prefer creating a service account and let the SA assume the project role?
  • p

    Prafulla Mahindrakar

    3 months ago
    So we currently dont have project level roles but they are defined at project-domain level and come from the values file here https://github.com/flyteorg/flyte/blob/76e865ae46c7fd391cfd3903a5aa860e64575a81/charts/flyte-core/values-eks.yaml#L385 If you have provided full s3 permissions to that role then the pods should be able to use that role when flyte launches the user pods. Also assuming you have done trust relationship for those roles aswell https://docs.flyte.org/en/latest/deployment/aws/manual.html#oidc-provider-for-the-eks-cluster What is the error you get ?
  • Hampus Rosvall

    Hampus Rosvall

    3 months ago
    I am not getting any errors, I am just wondering how I should handle different role per (project, domain). Say a developer wants to create a new project with Flyte, then the developer should use a role with specific permissions for the project. The way forward here is to create a IAM role with trust relationship for OIDC authentication with AWS and then override it in the Launch Plan?
  • Since there is an option to provide both IAM Role and Service Account, I was wondering what you considered to be the best practice here
  • auth_role = AuthRole(assumable_iam_role="my:iam:role")
    launch_plan.LaunchPlan.get_or_create(
        workflow=wf,
        name="your_lp_name_3",
        auth_role=auth_role,
    )
    In the above example, is the IAM role assumed by the
    flyte-user-role
    i.e., I need additional trust relationship, or is the
    flyte-user-role
    overriden with
    my:iam:role
    ?
  • p

    Prafulla Mahindrakar

    3 months ago
    You can add it to the launchplan, or you can add this during the execution creation time when you launch it from the console by clicking the
    Advanced Options
    or you can provide this at project-domain level using flytectl if you are using the latest admin and flytectl (docs are in review for this https://github.com/flyteorg/flytectl/pull/316)
  • my:iam:role
    will override the flyte-user-role
  • So the order is • directly on the execution request • then on launchplan • flytectl created matchable attribute (PR docs above) • application defaults from values yaml file
  • Regarding whats the best practise i have seen users use both . But i will defer to @Haytham Abuelfutuh to answer whats considered to be the best practise
  • Hampus Rosvall

    Hampus Rosvall

    3 months ago
    It seems in the docs that you specify the service account rather than IAM role, but maybe you can do a similar operation for the IAM role?
    security_context:
    	  run_as:
    		k8s_service_account: default
  • p

    Prafulla Mahindrakar

    3 months ago
    Yes . that example only shows k8s service account but you can do for iam role too
  • Also use securitycontext instead of AuthRole as its deprecated
  • Hampus Rosvall

    Hampus Rosvall

    3 months ago
    So I tried to create a role and specify it directly in the launch plan through the UI. However, when I describe the pod it still uses the default role
  • p

    Prafulla Mahindrakar

    3 months ago
    Are you using security context to specify in launchplan . can you share your launch plan spec using flytectl get launch plan -p porject -d domain <name> --latest -o yaml Also which admin version are you using
  • Hampus Rosvall

    Hampus Rosvall

    3 months ago
    I am using the 0.19.4 Helm release
  • ~  flytectl get launchplan -p hackday -d development --latest -o yaml
    - closure:
        createdAt: "2022-04-25T09:23:55.415089Z"
        expectedInputs: {}
        expectedOutputs: {}
        updatedAt: "2022-04-25T09:23:55.415089Z"
      id:
        domain: development
        name: flyte.workflows.workflow.my_wf
        project: hackday
        resourceType: LAUNCH_PLAN
        version: v0.0.5
      spec:
        annotations: {}
        defaultInputs: {}
        entityMetadata: {}
        fixedInputs: {}
        labels: {}
        rawOutputDataConfig: {}
        workflowId:
          domain: development
          name: flyte.workflows.workflow.my_wf
          project: hackday
          resourceType: WORKFLOW
          version: v0.0.5
    - closure:
        createdAt: "2022-04-08T13:17:18.634988Z"
        expectedInputs: {}
        expectedOutputs: {}
        updatedAt: "2022-04-08T13:17:18.634988Z"
      id:
        domain: development
        name: flyte.workflows.workflow.my_wf
        project: hackday
        resourceType: LAUNCH_PLAN
        version: v0.0.4
      spec:
        annotations: {}
        authRole: {}
        defaultInputs: {}
        entityMetadata: {}
        fixedInputs: {}
        labels: {}
        rawOutputDataConfig: {}
        workflowId:
          domain: development
          name: flyte.workflows.workflow.my_wf
          project: hackday
          resourceType: WORKFLOW
          version: v0.0.4
    - closure:
        createdAt: "2022-04-08T09:47:03.133154Z"
        expectedInputs: {}
        expectedOutputs: {}
        updatedAt: "2022-04-08T09:47:03.133154Z"
      id:
        domain: development
        name: flyte.workflows.workflow.my_wf
        project: hackday
        resourceType: LAUNCH_PLAN
        version: v0.0.3
      spec:
        annotations: {}
        authRole: {}
        defaultInputs: {}
        entityMetadata: {}
        fixedInputs: {}
        labels: {}
        rawOutputDataConfig: {}
        workflowId:
          domain: development
          name: flyte.workflows.workflow.my_wf
          project: hackday
          resourceType: WORKFLOW
          version: v0.0.3
    - closure:
        createdAt: "2022-04-08T09:40:20.073883Z"
        expectedInputs: {}
        expectedOutputs: {}
        updatedAt: "2022-04-08T09:40:20.073883Z"
      id:
        domain: development
        name: flyte.workflows.workflow.my_wf
        project: hackday
        resourceType: LAUNCH_PLAN
        version: v0.0.2
      spec:
        annotations: {}
        authRole: {}
        defaultInputs: {}
        entityMetadata: {}
        fixedInputs: {}
        labels: {}
        rawOutputDataConfig: {}
        workflowId:
          domain: development
          name: flyte.workflows.workflow.my_wf
          project: hackday
          resourceType: WORKFLOW
          version: v0.0.2
    - closure:
        createdAt: "2022-04-08T09:19:29.014621Z"
        expectedInputs: {}
        expectedOutputs:
          variables:
            o0:
              description: o0
              type:
                simple: STRING
        updatedAt: "2022-04-08T09:19:29.014621Z"
      id:
        domain: development
        name: flyte.workflows.workflow.my_wf
        project: hackday
        resourceType: LAUNCH_PLAN
        version: v0.0.1
      spec:
        annotations: {}
        authRole: {}
        defaultInputs: {}
        entityMetadata: {}
        fixedInputs: {}
        labels: {}
        rawOutputDataConfig: {}
        workflowId:
          domain: development
          name: flyte.workflows.workflow.my_wf
          project: hackday
          resourceType: WORKFLOW
          version: v0.0.1
  • I seem to be getting all the different versions of the launch plan. I have not entered the authRole in the LP at any given point, only tried to override it in the UI
  • p

    Prafulla Mahindrakar

    3 months ago
    Can you try the latest beta release and see if you are facing the same issue https://github.com/flyteorg/flyte/releases/tag/v1.0.0-b1
  • Hampus Rosvall

    Hampus Rosvall

    3 months ago
    Same thing when I use the
    <http://cr.flyte.org/flyteorg/flyteadmin:v0.6.149|cr.flyte.org/flyteorg/flyteadmin:v0.6.149>
    admin image @Prafulla Mahindrakar
  • I.e., I pass the role here
  • p

    Prafulla Mahindrakar

    3 months ago
    Thanks Hampus. Let me try to check more on this
  • So the issue seems to be that the IAM role arn passed from UI or flyteconsole gets correctly passed to the pod in the annotations field for the pod But the AWS_ROLE_ARN Env variable that exists for the pod keeps using the flyte-user-role This seems incorrect and also for a non-existent role that get passed in the UI/cli the pod is able to do S3 operations which we are assuming happens due to flyte-user-role env variable .
  • Hampus Rosvall

    Hampus Rosvall

    3 months ago
    Yes, so basically if we provide an IAM Role for the Launch Plan either by passing it through flyte console or by using
    flytectl
    it gets picked up in the Pod annotations, but it seems like it is using the
    flyte-user-role
    set in the environment variables. E.g., I can pass
    iamRoleArn: asdf
    and I am still able to interact with AWS resources as per my
    flyte-user-role
    IAM policy.
  • k

    katrina

    3 months ago
    hey @Hampus Rosvall do you mind fetching the execution using flytectl so we can inspect the security context there?
    flytectl get execution -p flytesnacks -d development <name> -o yaml
  • Hampus Rosvall

    Hampus Rosvall

    3 months ago
    @katrina sure, using 0.19.4 release?
  • k

    katrina

    3 months ago
    yup that works
  • p

    Prafulla Mahindrakar

    3 months ago
    I tried it on
    <http://ghcr.io/flyteorg/flyteadmin-release:v1.0.0-b1|ghcr.io/flyteorg/flyteadmin-release:v1.0.0-b1>
    With an unknown IamRole
    k get pods -n flytesnacks-development  avj9vsrl8qxx8bmbz9mv-n0-0 -o yaml
    apiVersion: v1
    kind: Pod
    metadata:
      annotations:
        <http://cluster-autoscaler.kubernetes.io/safe-to-evict|cluster-autoscaler.kubernetes.io/safe-to-evict>: "false"
        <http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: abcdefgh
        <http://kubernetes.io/psp|kubernetes.io/psp>: eks.privileged
    Execution failed rightly for me
    Exception: Called process exited with error code: 1.  Stderr dump:
    
              b'upload failed: ../tmp/flyte/local_flytekit/ae64be4b354052b1124d2fff213d9654/engine_dir/error.pb to <s3://flyte-demo/metadata/propeller/flytesnacks-development-avj9vsrl8qxx8bmbz9mv/n0/data/0/error.pb> An error occurred (AccessDenied) when calling the PutObject operation: Access Denied\n'
            reason: Error
            startedAt: "2022-04-25T17:10:00Z"
  • Hampus Rosvall

    Hampus Rosvall

    3 months ago
    @katrina
    flytectl get execution -p hackday -d development f0dbbc7271146457a8d5 -o yaml
    closure:
      createdAt: "2022-04-25T14:12:08.553786193Z"
      duration: 46.179998699s
      outputs:
        uri: s3://<bucket>/metadata/propeller/hackday-development-f0dbbc7271146457a8d5/end-node/data/0/outputs.pb
      phase: SUCCEEDED
      startedAt: "2022-04-25T14:12:13.666273249Z"
      stateChangeDetails:
        occurredAt: "2022-04-25T14:12:08.553786193Z"
      updatedAt: "2022-04-25T14:12:59.846271699Z"
      workflowId:
        domain: development
        name: flyte.workflows.workflow.my_wf
        project: hackday
        resourceType: WORKFLOW
        version: v0.0.5
    id:
      domain: development
      name: f0dbbc7271146457a8d5
      project: hackday
    spec:
      authRole:
        assumableIamRole: arn:aws:iam::asdasd
      launchPlan:
        domain: development
        name: flyte.workflows.workflow.my_wf
        project: hackday
        resourceType: LAUNCH_PLAN
        version: v0.0.5
      metadata:
        systemMetadata: {}
      securityContext:
        runAs:
          iamRole: arn:aws:iam::asdasd
    So it actually looks correct, but I am downloading some data from S3 in the task which runs successfully which makes me think it is using another role, or am I missing something?
  • @Prafulla Mahindrakar how does your ENV look like?
  • p

    Prafulla Mahindrakar

    3 months ago
    It doesn’t have the ENV that you are seeing and i was checking how AWS_ROLE_ARN and other aws env variables show up and seems those are injected ones .
    memory:  500Mi
        Environment:
          FLYTE_INTERNAL_CONFIGURATION_PATH:  /root/sandbox.config
          FLYTE_INTERNAL_IMAGE:               <http://ghcr.io/flyteorg/flytecookbook:core-773447b298bfa8ecfc2b25983ce1ed2d33753d01|ghcr.io/flyteorg/flytecookbook:core-773447b298bfa8ecfc2b25983ce1ed2d33753d01>
          FLYTE_INTERNAL_EXECUTION_WORKFLOW:  flytesnacks:development:core.basic.lp.go_greet
          FLYTE_INTERNAL_EXECUTION_ID:        avj9vsrl8qxx8bmbz9mv
          FLYTE_INTERNAL_EXECUTION_PROJECT:   flytesnacks
          FLYTE_INTERNAL_EXECUTION_DOMAIN:    development
          FLYTE_ATTEMPT_NUMBER:               0
          FLYTE_INTERNAL_TASK_PROJECT:        flytesnacks
          FLYTE_INTERNAL_TASK_DOMAIN:         development
          FLYTE_INTERNAL_TASK_NAME:           core.basic.lp.greet
          FLYTE_INTERNAL_TASK_VERSION:        773447b298bfa8ecfc2b25983ce1ed2d33753d01
          FLYTE_INTERNAL_PROJECT:             flytesnacks
          FLYTE_INTERNAL_DOMAIN:              development
          FLYTE_INTERNAL_NAME:                core.basic.lp.greet
          FLYTE_INTERNAL_VERSION:             773447b298bfa8ecfc2b25983ce1ed2d33753d01
        Mounts:
          /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nvnx4 (ro)
  • Admin has config to set annotation key .can you check if you have that set
    flyteadmin:
          roleNameKey: "<http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>"
          profilerPort: 10254
          eventVersion: 2
          metricsScope: "flyte:"
          metadataStoragePrefix:
  • Hampus Rosvall

    Hampus Rosvall

    3 months ago
    @Prafulla Mahindrakar the
    AWS_
    env vars are set by the Service Account on EKS (https://docs.aws.amazon.com/eks/latest/userguide/specify-service-account-role.html)
  • So my guess is that if we override the service account in the security context we will assume the project specific role rather than the default one
  • I tried to create a Service Account and provide it in the security context in the launch plan, then it get picked up correctly by the Pod
  • k

    katrina

    3 months ago
    great, so the behavior was due to the project-level setting being picked up?
  • Mücahit

    Mücahit

    3 months ago
    hey! I also experienced the same situation today, specifying Role from the console > Advanced Settings adds an annotation to the pod but the AWS_ env vars are not set to the pod. It works when a ServiceAccount(IRSA enabled) is specified instead of an IAM role
  • Hampus Rosvall

    Hampus Rosvall

    3 months ago
    @katrina I think the problem is that the provided IAM Role in the launch plan never gets assumed by the role in the service account. Not sure how that should be handled, but creating a (project, domain) specific service account is sufficient currently
  • k

    katrina

    3 months ago
    sorry trying to understand, in the example above
    iamRole: arn:aws:iam::asdasd
    was not correct or expected?
  • Hampus Rosvall

    Hampus Rosvall

    3 months ago
    No exactly, I was trying to enter an arbitrary role that should fail, but the workflow still ran successfully as it was never assumed.
  • k

    katrina

    3 months ago
    oh sorry, i thought that was redacted 😂 you mean asdasd literally, i see!
  • sounds like something on our end to debug, thank you for walking through this with us!
  • Hampus Rosvall

    Hampus Rosvall

    3 months ago
    Sure! If possible, can you point me to the code where you parse additional IAM Roles provided in the security context?
  • k

    katrina

    3 months ago
    it's a little convoluted b/c of all the deprecated fields in idl and the overridable layers at which you can set the context, but here's https://github.com/flyteorg/flyteadmin/blob/master/pkg/manager/impl/execution_manager.go#L493:28 & https://github.com/flyteorg/flyteadmin/blob/master/pkg/manager/impl/execution_manager.go#L912,L913 if you want to poke around
  • p

    Prafulla Mahindrakar

    3 months ago
    In my experimentation, I wasn’t able to use the IAM role that gets passed into the pods through annotations and it always depended on what service account was being used by the pod , And the service account should be annotated with the right IAM role so that the aws cli command being used by the container perform the s3 operations successfully. @katrina do we know if this ever worked or does it require some additional setup in the cluster or may be i am missing something. I tried creating a sample test pod with annotations and service account manually and performed s3 operations from within the container. My observations are in line that the pod isn’t using the annotation and use the service account
    kubectl exec -it test-praf -n flytesnacks-development -- /bin/bash
    root@test-praf:~# 
    root@test-praf:~# 
    root@test-praf:~# 
    root@test-praf:~# touch a
    root@test-praf:~# aws s3 cp a <s3://flyte-demo/metadata/propeller/flytesnacks-development-akqfpd7n4b8lh78m9c5r/n0/data/0/a>
    upload failed: ./a to <s3://flyte-demo/metadata/propeller/flytesnacks-development-akqfpd7n4b8lh78m9c5r/n0/data/0/a> An error occurred (AccessDenied) when calling the PutObject operation: Access Denied
    The pod has the following
    k get pod -n flytesnacks-development test-praf -o yaml            
    apiVersion: v1
    kind: Pod
    metadata:
      annotations:
        <http://iam.amazonaws.com/role|iam.amazonaws.com/role>: arn:aws:iam::590375264460:role/eksctl-flyte-demo-2-addon-iamserviceaccount-Role1-11QUDNRU7X84P
       ....
      name: test-praf
      namespace: flytesnacks-development
    ....
      serviceAccount: default
      serviceAccountName: default
    Now if i use it with service account annotated with IAM role which has permissions 1NRJQGB2NSHL9 , then AWS_ROLE_ARN is exported with this annotated role and it doesn’t matter what additional annotation are on the pod related to role which are simply ignored
    k get pod -n flytesnacks-development test-praf -o yaml
    apiVersion: v1
    kind: Pod
    metadata:
      annotations:
        <http://iam.amazonaws.com/role|iam.amazonaws.com/role>: arn:aws:iam::590375264460:role/eksctl-flyte-demo-2-addon-iamserviceaccount-Role1-11QUDNRU7X84P
    
    ....
        - name: AWS_ROLE_ARN
          value: arn:aws:iam::590375264460:role/eksctl-flyte-demo-2-addon-iamserviceaccount-Role1-1NRJQGB2NSHL9
        - name: AWS_WEB_IDENTITY_TOKEN_FILE
          value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    ....
    serviceAccount: demo
      serviceAccountName: demo
    And it allows me to copy to s3 using this annotated rrole.
    kubectl exec -it test-praf -n flytesnacks-development -- /bin/bash
    root@test-praf:~# touch a
    root@test-praf:~# aws s3 cp a <s3://flyte-demo/metadata/propeller/flytesnacks-development-akqfpd7n4b8lh78m9c5r/n0/data/0/a>
    upload: ./a to <s3://flyte-demo/metadata/propeller/flytesnacks-development-akqfpd7n4b8lh78m9c5r/n0/data/0/a>
    root@test-praf:~# cat ~/.aws/cli/cache/574579698c8d6815d0805a57d875c58fa630e77a.json 
    {"Credentials": {"AccessKeyId": "...", ..., "Expiration": "2022-04-26T13:52:59Z"}, "SubjectFromWebIdentityToken": "system:serviceaccount:flytesnacks-development:demo", "AssumedRoleUser": {"AssumedRoleId": "AROAYS5I3UDGCIBQWKOCJ:botocore-session-1650977579", "Arn": "arn:aws:sts::590375264460:assumed-role/eksctl-flyte-demo-2-addon-iamserviceaccount-Role1-1NRJQGB2NSHL9/botocore-session-1650977579"}, "Provider": "arn:aws:iam::590375264460:oidc-provider/oidc.eks.us-east-2.amazonaws.com/id/46B254ACC1AC1B23CCCA2973F62AB323", "Audience": "<http://sts.amazonaws.com|sts.amazonaws.com>", "ResponseMetadata": {"RequestId": "b9301edc-ce4f-42ff-a049-4cb09936c5c4", "HTTPStatusCode": 200, "HTTPHeaders": {"x-amzn-requestid": "b9301edc-ce4f-42ff-a049-4cb09936c5c4", "content-type": "text/xml", "content-length": "1975", "date": "Tue, 26 Apr 2022 12:52:59 GMT"}, "RetryAttempts": 0}}
  • After discussion have created this issue https://github.com/flyteorg/flyte/issues/2417 So basically in your case you should only be using the service account which has the right annotated roles.