dry-oxygen-50969
11/22/2023, 2:32 PM11/22/2023 2:27:40 PM UTC task submitted to K8s
11/22/2023 2:27:40 PM UTC [ContainersNotReady|ContainerCreating]: containers with unready status: [f5e7547d4c8994d4d992-n0-0]|
kages/flytekit/bin/entrypoint.py:519 in │
│ fast_execute_task_cmd │
│ │
│ ❱ 519 │ │ _download_distribution(additional_distribution, dest_dir) │
│ │
│ /usr/local/lib/python3.11/site-packages/flytekit/core/utils.py:295 in │
│ wrapper │
│ │
│ ❱ 295 │ │ │ │ return func(*args, **kwargs) │
│ │
│ /usr/local/lib/python3.11/site-packages/flytekit/tools/fast_registration.py: │
│ 113 in download_distribution │
│ │
│ ❱ 113 │ FlyteContextManager.current_context().file_access.get_data(additio │
│ │
│ /usr/local/lib/python3.11/site-packages/flytekit/core/data_persistence.py:47 │
│ 5 in get_data │
│ │
│ ❱ 475 │ │ │ raise FlyteAssertion( │
╰──────────────────────────────────────────────────────────────────────────────╯
FlyteAssertion: Failed to get data from
s3://<my-bucket-here>/flytesnacks/development/RUD7F4QDHIZRCQGDFXZKERK4GM======/scr
ipt_mode.tar.gz to /root/ (recursive=False).
Original exception: Access Denied
I feel pretty lost and I'm unsure where to go from here. I'd really appreciate any help or advice! Thank you :)
Update: I manually gave the file in the error above full world read permissions for testing and it did make it past that issue. Regardless, I'm given a new failure error:
`tar: Removing leading /' from member names
average-finland-92144
11/22/2023, 4:04 PMaverage-finland-92144
11/22/2023, 4:05 PMdry-oxygen-50969
11/22/2023, 4:07 PMaverage-finland-92144
11/22/2023, 4:07 PMaws iam get-role --role-name flyte-system-role --query Role.AssumeRolePolicyDocument
dry-oxygen-50969
11/22/2023, 4:08 PMdry-oxygen-50969
11/22/2023, 4:09 PM{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::<my-aws-acct-id>:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/DC178F5D689F6DDF61B7E0F99688DED4"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"<http://oidc.eks.us-east-1.amazonaws.com/id/DC178F5D689F6DDF61B7E0F99688DED4:aud|oidc.eks.us-east-1.amazonaws.com/id/DC178F5D689F6DDF61B7E0F99688DED4:aud>": "<http://sts.amazonaws.com|sts.amazonaws.com>",
"<http://oidc.eks.us-east-1.amazonaws.com/id/DC178F5D689F6DDF61B7E0F99688DED4:sub|oidc.eks.us-east-1.amazonaws.com/id/DC178F5D689F6DDF61B7E0F99688DED4:sub>": "system:serviceaccount:flyte:flyte-backend-flyte-binary"
}
}
}
]
}
dry-oxygen-50969
11/22/2023, 4:09 PMaverage-finland-92144
11/22/2023, 4:10 PMkubectl describe sa -n flyte flyte-backend-flyte-binary
dry-oxygen-50969
11/22/2023, 4:11 PMName: flyte-backend-flyte-binary
Namespace: flyte
Labels: <http://app.kubernetes.io/instance=flyte-backend|app.kubernetes.io/instance=flyte-backend>
<http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
<http://app.kubernetes.io/name=flyte-binary|app.kubernetes.io/name=flyte-binary>
<http://app.kubernetes.io/version=1.16.0|app.kubernetes.io/version=1.16.0>
<http://helm.sh/chart=flyte-binary-v1.10.0|helm.sh/chart=flyte-binary-v1.10.0>
Annotations: <http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: arn:aws:iam::<my-aws-acct-id>:role/flyte-system-role
<http://meta.helm.sh/release-name|meta.helm.sh/release-name>: flyte-backend
<http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: flyte
Image pull secrets: <none>
Mountable secrets: <none>
Tokens: <none>
Events: <none>
average-finland-92144
11/22/2023, 4:12 PMdry-oxygen-50969
11/22/2023, 4:13 PM{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:*",
"s3-object-lambda:*"
],
"Resource": "*"
}
]
}
dry-oxygen-50969
11/22/2023, 4:14 PMeksctl create iamserviceaccount --cluster=<my-flyte-cluster> --name=flyte-backend-flyte-binary --role-only --role-name=flyte-system-role --attach-policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess --approve --region <region-code> --namespace flyte
average-finland-92144
11/22/2023, 4:15 PM/' from member names
I guess you're running pyflyte run --remote ...
right?dry-oxygen-50969
11/22/2023, 4:15 PMdry-oxygen-50969
11/22/2023, 4:16 PMpyflyte run --remote hello_world.py hello_world_wf
dry-oxygen-50969
11/22/2023, 4:16 PMaverage-finland-92144
11/22/2023, 4:17 PM/' from member names
<--this is not an error.
It's more like a cryptic but normal log of the untar operation that happens when you do "fast registration", but that's a different story.average-finland-92144
11/22/2023, 4:17 PMkubectl describe sa default -n flytesnacks-development
(Assuming you're not providing a different project-domain)dry-oxygen-50969
11/22/2023, 4:18 PMName: default
Namespace: flytesnacks-development
Labels: <none>
Annotations: <none>
Image pull secrets: <none>
Mountable secrets: <none>
Tokens: <none>
Events: <none>
dry-oxygen-50969
11/22/2023, 4:18 PMaverage-finland-92144
11/22/2023, 4:18 PMdry-oxygen-50969
11/22/2023, 4:18 PMdry-oxygen-50969
11/22/2023, 4:18 PMaverage-finland-92144
11/22/2023, 4:19 PMaverage-finland-92144
11/22/2023, 4:19 PMaverage-finland-92144
11/22/2023, 4:26 PMeksctl create iamserviceaccount --cluster=<your-EKS-cluster-name>--name=default --role-only --role-name=flyte-workers --attach-policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess --approve --region <region-code> --namespace flyte
2. If you run aws iam get-role --role-name flyte-workers --query Role.AssumeRolePolicyDocument
it should look similar to:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::<>acct-id>:oidc-provider/oidc.eks.<region-code>.<http://amazonaws.com/id/<UUID-OIDC|amazonaws.com/id/<UUID-OIDC>>"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.<region-code>.<http://amazonaws.com/id/<UUID-OIDC>:sub|amazonaws.com/id/<UUID-OIDC>:sub>": "system:serviceaccount:flyte:default",
"oidc.eks.<region-code>.<http://amazonaws.com/id/<UUID-OIDC>:aud|amazonaws.com/id/<UUID-OIDC>:aud>": "<http://sts.amazonaws.com|sts.amazonaws.com>"
}
}
}
]
}
3. If that's the case, edit the IAM Role to change from the flyte
namespace to *
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::<>acct-id>:oidc-provider/oidc.eks.<region-code>.<http://amazonaws.com/id/<UUID-OIDC|amazonaws.com/id/<UUID-OIDC>>"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.<region-code>.<http://amazonaws.com/id/<UUID-OIDC>:sub|amazonaws.com/id/<UUID-OIDC>:sub>": "system:serviceaccount:*:default",
"oidc.eks.<region-code>.<http://amazonaws.com/id/<UUID-OIDC>:aud|amazonaws.com/id/<UUID-OIDC>:aud>": "<http://sts.amazonaws.com|sts.amazonaws.com>"
}
}
}
]
}
This is because for every project-domain combination you'll have a different namespace and a default
KSA on each, so making it a wildcard is a convenience here
Not ideal but let me know if it worksdry-oxygen-50969
11/22/2023, 4:37 PMflyte-workers
role?average-finland-92144
11/22/2023, 4:39 PMdry-oxygen-50969
11/22/2023, 4:41 PMaverage-finland-92144
11/22/2023, 4:42 PMStringEquals
to StringLike
average-finland-92144
11/22/2023, 4:44 PMShould I re-annotate with the newregarding this, make sure your Helm values include the following: 1.role?flyte-workers
configuration:
inline:
cluster_resources:
customData:
- production:
- defaultIamRole:
value: arn:aws:iam::<acct-id>:role/flyte-workers
- staging:
- defaultIamRole:
value: arn:aws:iam::<acct-id>:role/flyte-workers
- development:
- defaultIamRole:
value: arn:aws:iam::<acct-id>:role/flyte-workers
2.
clusterResourceTemplates:
inline:
002_serviceaccount.yaml: |
apiVersion: v1
kind: ServiceAccount
metadata:
name: default
namespace: '{{ namespace }}'
annotations:
<http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: '{{ defaultIamRole }}'
You just need to update the acct-id
and then run a Helm upgradedry-oxygen-50969
11/22/2023, 4:55 PMpyflyte run --remote hello_world.py hello_world_wf
Failed with Exception Code: SYSTEM:Unknown
RPC Failed, with Status: StatusCode.INTERNAL
details: failed to create a signed url. Error: WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
status code: 403, request id: 7e897ee1-045f-4a91-af19-a5594380fa95
Debug string UNKNOWN:Error received from peer {grpc_message:"failed to create a signed url. Error: WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: 7e897ee1-045f-4a91-af19-a5594380fa95", grpc_status:13, created_time:"2023-11-22T09:54:13.754081855-07:00"}
Gives me role permissions errors nowaverage-finland-92144
11/22/2023, 4:55 PMdefault
SA annotated?dry-oxygen-50969
11/22/2023, 4:57 PMdry-oxygen-50969
11/22/2023, 5:04 PMFlyteAssertion: Failed to get data from
<s3://flyte-metadata/flytesnacks/development/CXNXVNZLWOB3ULGK3EUPEK666M======/scr>
ipt_mode.tar.gz to /root/ (recursive=False).
Original exception: Access Denied
And then to temporarily get around this I manually enable public access but it still fails with just:
tar: Removing leading `/' from member names
dry-oxygen-50969
11/22/2023, 5:06 PMdefault
SA is not annotated.average-finland-92144
11/22/2023, 5:06 PMdry-oxygen-50969
11/22/2023, 5:06 PMaverage-finland-92144
11/22/2023, 5:07 PMkubectl edit sa default -n flytesnacks-development
and add the annotation:
<http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: arn:aws:iam::<acct-id>:role/flyte-workers
dry-oxygen-50969
11/22/2023, 5:08 PMconfiguration:
database:
username: flyteadmin
password: "<db-pass>"
host: <db-url>
dbname: flyteadmin
storage:
metadataContainer: flyte-metadata
userDataContainer: flyte-userdata
provider: s3
providerConfig:
s3:
region: "us-east-1"
authType: "iam"
inline:
plugins:
k8s:
inject-finalizer: true
default-env-vars:
- AWS_METADATA_SERVICE_TIMEOUT: 5
- AWS_METADATA_SERVICE_NUM_ATTEMPTS: 20
storage:
cache:
max_size_mbs: 100
target_gc_percent: 100
cluster_resources:
customData:
- production:
- defaultIamRole:
value: arn:aws:iam::<acct id>:role/flyte-workers
- staging:
- defaultIamRole:
value: arn:aws:iam::<acct id>:role/flyte-workers
- development:
- defaultIamRole:
value: arn:aws:iam::<acct id>:role/flyte-workers
clusterResourceTemplates:
inline:
002_serviceaccount.yaml: |
apiVersion: v1
kind: ServiceAccount
metadata:
name: default
namespace: '{{ namespace }}'
annotations:
eks.amazonaws.com/role-arn: '{{ defaultIamRole }}'
serviceAccount:
create: true
annotations:
eks.amazonaws.com/role-arn: "arn:aws:iam::<acct id>:role/flyte-system-role"
Followed by
helm upgrade flyte-backend flyteorg/flyte-binary -n flyte --values eks-starter.yaml
average-finland-92144
11/22/2023, 5:12 PMdry-oxygen-50969
11/22/2023, 5:15 PMName: default
Namespace: flytesnacks-development
Labels: <none>
Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::<acct id>:role/flyte-workers
Image pull secrets: <none>
Mountable secrets: <none>
Tokens: <none>
Events: <none>
Its annotated manually now but receiving the same failure and tar messageaverage-finland-92144
11/22/2023, 5:16 PMflytesnacks-development
namespace?
kubectl get pods -n flytesnacks-development
dry-oxygen-50969
11/22/2023, 5:17 PMNAME READY STATUS RESTARTS AGE
a588f4jx5jcxdwqmr5ms-n0-0 0/1 OOMKilled 0 3m18s
a6zn9kr6frfl82zmqg99-n0-0 0/1 Error 0 14m
a8slvpckm9z42dwp47xt-n0-0 0/1 OOMKilled 0 12m
acxxm4j4g6xx7qbkklql-n0-0 0/1 Error 0 3h33m
advlcfj6r4ccb2qczrj6-n0-0 0/1 Error 0 3h17m
am2htqq2kff5c8cv6zkm-n0-0 0/1 Error 0 3h25m
amjsd9ncgkc24fcbstjh-n0-0 0/1 Error 0 16m
amxjs9gbv78bw6b2s7d5-n0-0 0/1 Error 0 96m
aphp5t4ms5b59vm6cgff-n0-0 0/1 OOMKilled 0 2m39s
ascxj9pb4gtcq7g8hdt4-n0-0 0/1 OOMKilled 0 95m
asq4jdzlp6qxh8bbkxmg-n0-0 0/1 OOMKilled 0 103m
f12e6b2489129437caf9-n0-0 0/1 Error 0 112m
f325c37c092a84f2c831-n0-0 0/1 Error 0 15m
f5e7547d4c8994d4d992-n0-0 0/1 Error 0 169m
f6d21a6916aa94b85919-n0-0 0/1 Error 0 19m
f89dee937e9da4242ba8-n0-0 0/1 OOMKilled 0 4m23s
faa87eed69709459781e-n0-0 0/1 Error 0 104m
fae1758d5567949e6bdc-n0-0 0/1 Error 0 96m
fb6d1f3d7f1844657877-n0-0 0/1 Error 0 3h34m
ffb213020e44741e7859-n0-0 0/1 Error 0 3h15m
dry-oxygen-50969
11/22/2023, 5:21 PMdry-oxygen-50969
11/22/2023, 5:21 PMaverage-finland-92144
11/22/2023, 5:21 PMOOMKIlled
can you add the following to your values file first and upgrade
configuration:
inline:
task_resources:
defaults:
cpu: 100m
memory: 100Mi
storage: 100Mi
limits:
memory: 2Gi
average-finland-92144
11/22/2023, 5:22 PMdry-oxygen-50969
11/22/2023, 5:22 PMdry-oxygen-50969
11/22/2023, 5:23 PMdry-oxygen-50969
11/22/2023, 5:23 PMcluster_resources
segment?average-finland-92144
11/22/2023, 5:24 PMdry-oxygen-50969
11/22/2023, 5:24 PMdry-oxygen-50969
11/22/2023, 5:27 PMf294923173c7444c39a1-n0-0 0/1 OOMKilled 0 32s
`tar: Removing leading /' from member names
And the hello_world.py from flytesnacks (though I've tried others just in case, and they all fail in the same way)
# %% [markdown]
#
# # Hello, World!
#
# ```{eval-rst}
# .. tags:: Basic
#
#
# Let's write a Flyte {py:func}`~flytekit.workflow` that invokes a
# {py:func}`~flytekit.task` to generate the output "Hello, World!".
#
# Flyte tasks are the core building blocks of larger, more complex workflows.
# Workflows compose multiple tasks – or other workflows –
# into meaningful steps of computation to produce some useful set of outputs or outcomes.
#
# To begin, import task
and workflow
from the flytekit
library.
# %%
from flytekit import task, workflow
# %% [markdown]
# Define a task that produces the string "Hello, World!".
# Simply using the @task
decorator to annotate the Python function.
# %%
@task
def say_hello() -> str:
return "Hello, World!"
# %% [markdown]
# You can handle the output of a task in the same way you would with a regular Python function.
# Store the output in a variable and use it as a return value for a Flyte workflow.
# %%
@workflow
def hello_world_wf() -> str:
res = say_hello()
return res
# %% [markdown]
# Run the workflow by simply calling it like a Python function.
# %%
if name == "__main__":
print(f"Running hello_world_wf() {hello_world_wf()}")
# %% [markdown]
# Next, let's delve into the specifics of {ref}`tasks <task>`,
# {ref}`workflows <workflow>` and {ref}`launch plans <launch_plan>`.```average-finland-92144
11/22/2023, 5:31 PMtask_resources:
defaults:
cpu: 1000m
memory: 1000Mi
storage: 1000Mi
limits:
storage: 2000Mi
dry-oxygen-50969
11/22/2023, 5:33 PMaverage-finland-92144
11/22/2023, 5:34 PMaverage-finland-92144
11/22/2023, 5:34 PMdry-oxygen-50969
11/22/2023, 5:34 PMaverage-finland-92144
11/22/2023, 5:35 PM"Action": [
"s3:DeleteObject*",
"s3:GetObject*",
"s3:ListBucket",
"s3:PutObject*"
],
"Resource": [
"arn:aws:s3:::<your-S3-bucket>*",
"arn:aws:s3:::<your-S3-bucket>*/*"
],
dry-oxygen-50969
11/22/2023, 5:37 PMdry-oxygen-50969
11/22/2023, 5:48 PMflyte-workers
role and I could get rid of that I imagine?
I will give removing it a shot when I get back to my computer in a bit, but my guess is that the resource configuration fixed the issue.average-finland-92144
11/22/2023, 5:57 PMdefault
SA used by the workers. And that seems a bit off in terms of self-contained security policies, sharing IAM roles with multiple SAs? The end result could be the same and the idea with FTHW is to provide a quickstart, but now I guess we need to rethink it so it helps set up a production grade environment.
I've been iterating recently on a reference implementation built with Terraform that should incorporate all these reccomendations: https://github.com/unionai-oss/deploy-flyte/tree/main/environments/aws