So I’m stuck on S3 access - I created a service ac...
# ask-the-community
s
So I’m stuck on S3 access - I created a service account called
flyte-executor
with the
flyte-user-role
(which has full s3 access) attached as an annotation and running Flyte executions with this service account, but it’s giving me PutObject access denied error. This service account is in the project+domain namespace. What am I doing wrong?
y
stupid question but the iam role has the putobject permission right?
this is annoying to do I apologize, but maybe we can also try to run the pod manually, but instead of the pyflyte command run instead
aws sts get-caller-identity
s
flyte-user-role has
AmazonS3FullAccess
policy attached to it
How can I run the pod manually?
I see the pod for the task that failed but the status is failed so I can’t exec into it
When I log the pod, I see some other errors as well: for example,
fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden
@Yee Should I maybe add
sleep infinity
to the args in the yaml and
exec
into it to run
aws sts get-caller-identity
? Let me know when you got a minute - thanks!
s
cc: @Yee
y
hey sorry is this still an issue @seunggs
s
Yes - can you provide me a bit more detail on what to do to start debugging why I have s3 permission issues?
Thanks! @Yee
y
if you want to run the pod manually you have to create a yaml file, but yeah this will work
create the pod, make sure you have the correct service account, remove the status field, ownership and anything else that looks like it might be related to an existing flyteworkflow execution
update the args to sleep infinity and then yeah you can attach to it and investigate.
s
OK so create a completely new pod
I’ll try it tonight and report back - thank you!
y
i’m not convinced that’s necessary though. are you sure the service account has the iam role attached
maybe easier to first try to assume the role manually just on your laptop through the aws cli
run the same s3 command
just isolate out the aws side first, and then deal with the k8s side
s
Hmm ok - here’s my sa yaml that’s deployed
Copy code
apiVersion: v1
imagePullSecrets:
- name: gcr-json-key
kind: ServiceAccount
metadata:
  annotations:
    <http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: arn:aws:iam::xxx:role/flyte-user-role
  labels:
    <http://app.kubernetes.io/managed-by|app.kubernetes.io/managed-by>: pulumi
  name: flyte-executor
  namespace: shelly-robotics-bipedal-robot-development
  resourceVersion: "57747250"
  uid: 9db3e9da-cf32-4a78-8b06-81d83b66c611
secrets:
- name: flyte-executor-token-l6rkj
As you can see the annotation is there - unless that’s not sufficient to link an SA to an iam role
y
can you
get pod <pod name> -o yaml
and grep for “iam”
you should see aws-iam-token a couple times
and if that’s the case then yeah, can you try assuming the role locally
s
So
get pod <pod name> -o yaml | grep 'iam'
returns this
Copy code
value: arn:aws:iam::xxx:role/flyte-user-role
      name: aws-iam-token
  - name: aws-iam-token
Does this look right?
y
Yeah
Can you try assuming locally?
s
When I try
aws sts assume-role …
locally, I get this error:
An error occurred (AccessDenied) when calling the AssumeRole operation
y
mm
okay sure, try copying the pod yaml and editing so that the command is just a sleep
maybe your role is not set up for you to assume it
s
Hmm - I have admin access to all resources. I do have multiple clusters running, one of which is flyte and here’s the trusted relationships:
Copy code
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::xxx:oidc-provider/oidc.eks.us-west-1.amazonaws.com/id/2A6739B7813451087E3258C60BC37CF4"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "<http://oidc.eks.us-west-1.amazonaws.com/id/2A6739B7813451087E3258C60BC37CF4:aud|oidc.eks.us-west-1.amazonaws.com/id/2A6739B7813451087E3258C60BC37CF4:aud>": "<http://sts.amazonaws.com|sts.amazonaws.com>"
        }
      }
    },
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "<http://ec2.amazonaws.com|ec2.amazonaws.com>"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
Where oidc is from flyte eks cluster
y
what do you mean?
s
Oh I was responding to your comment regarding the role not being setup to be assumed
I think a pod in the flyte cluster should be able to assume this role via service account
y
yeah try that i guess, just make the pod yaml
let us know if you need an example
s
OK would it be fine to just get rid of the status from the yaml I get from the failed task pod? And then update the args to
sleep infinity
?
OK I just changed the pod name and created it and ran
aws sts get-caller-identity
and here’s the response:
Copy code
{
  "UserId": "xxx:botocore-session-1663707573",
  "Account": "xxx",
  "Arn": "arn:aws:sts::xxx:assumed-role/flyte-user-role/botocore-session-1663707573"
}
y
can you try to run the failing s3 command also?
s
Hmm where can I grab this command?
You mean the pyflyte command?
y
no it was in the error
putobject access denied error
just
Copy code
cat > abc
hello
^C
and then try to aws s3 cp abc s3:/….
s
ok one sec
No error - confirmed that the file was uploaded to the right bucket path in s3
aws s3 cp abc <s3://sidetrek-flyte-cluster-flyte-bucket/metadata/propeller/shelly-robotics-bipedal-robot-development-an6gvhl5dn8vr44nn9ds/n0/data/0/abc.txt>
y
but when the task runs with the same role, it can’t?
s
Yes that seems to be the case
One thing though - maybe I’m doing it wrong
So when I launch the execution in the flyte dashboard
I only fill out Kubernetes Service Account in the
Security Context
section - leaving IAM Role field empty
Could that be what’s creating the problem??
Do I need to fill out both? I assumed service account is enough since it references the iam role in it
y
no one should be enough
s
Oh ok then it’s strange
y
can you change your task code to call sts get caller identity
s
I seem to be able to run s3 commands manually in the pod fine but it errors out when I run it from dashboard
s
You mean replace the actual code of the failing task with just this?
Copy code
>>> import boto3
>>> boto3.client('sts').get_caller_identity().get('Account')
y
yeah
in the body of the task
s
OK
Hi @Yee I had a chance to run this and I get the same errors: a whole bunch of HeadObject 403 forbidden errors and PutObject access denied errors like this:
Copy code
{
  "asctime": "2022-09-22 00:01:44,624",
  "name": "flytekit",
  "levelname": "ERROR",
  "message": "Exception when trying to execute ['aws', '--endpoint-url', '<http://minio.flyte:9000>', 's3', 'cp', '--recursive', '--acl', 'bucket-owner-full-control', '/tmp/flyte-oz6o659c/sandbox/local_flytekit/engine_dir', '<s3://sidetrek-flyte-cluster-flyte-bucket/metadata/propeller/shelly-robotics-bipedal-robot-development-aq44hcpb7rdhwxpw22k9/n0/data/3>'], reason: Called process exited with error code: 1.  Stderr dump:\n\nb'upload failed: ../tmp/flyte-oz6o659c/sandbox/local_flytekit/engine_dir/error.pb to <s3://sidetrek-flyte-cluster-flyte-bucket/metadata/propeller/shelly-robotics-bipedal-robot-development-aq44hcpb7rdhwxpw22k9/n0/data/3/error.pb> An error occurred (AccessDenied) when calling the PutObject operation: Access Denied.\\n'"
}
So it looks like it doesn’t even get to running the task code - bunch of fatal errors due to s3 issue
s
Are you still seeing the issue, @seunggs?
s
Hey @Samhita Alla thanks for checking up on this - I was just on call with @Yee who helped me resolve it. It was my mistake with helm chart values setup where it was still using minio that was causing the problem. Thank you!
292 Views