Hello, I am deploying a new Flyte cluster with hel...
# ask-the-community
f
Hello, I am deploying a new Flyte cluster with helm chart. I am able to run pyflyte run --remote. However the workflow failed to start remotely. And the error in propeller suggesting it’s not able to write the S3 bucket. From UI:
Copy code
Workflow[customer-ds:development:<http://ml_project_1.test_env_vars.wf|ml_project_1.test_env_vars.wf>] failed. RuntimeExecutionError: max number of system retry attempts [51/50] exhausted. Last known status message: failed at Node[start-node]. CausedByError: Failed to store workflow inputs (as start node), caused by: Failed to write data [0b] to path [metadata/propeller/customer-ds-development-f829d772e914540dfa24/start-node/data/0/outputs.pb].: PutObject, putting object: AccessDenied: Access Denied
	status code: 403, request id: TBZ9Z342JT9MVTJP, host id: OCexqZHh7PUeIh7kQWMjmZ3gAcE98R1OqtpGim/Awc66m5UgH+JdnlMmxmgRi70ZbRfCKITm7RG+aXq/AKy5QQ==
From propeller logs:
Copy code
│ {"json":{"exec_id":"aggb47m8lth9qgr8m4vr","node":"start-node","ns":"customer-ds-development","res_ver":"991597274","routine":"worker-28","wf":"customer-ds:development:<http://ml_project_1.test_env_vars.wf|ml_project_1.test_env_vars.wf>"},"level":"error","msg":"Failed to wr │
│ {"json":{"exec_id":"aggb47m8lth9qgr8m4vr","node":"start-node","ns":"customer-ds-development","res_ver":"991597274","routine":"worker-28","wf":"customer-ds:development:<http://ml_project_1.test_env_vars.wf|ml_project_1.test_env_vars.wf>"},"level":"error","msg":"Failed to wr │
│ {"json":{"exec_id":"aggb47m8lth9qgr8m4vr","ns":"customer-ds-development","res_ver":"991597274","routine":"worker-28","wf":"customer-ds:development:<http://ml_project_1.test_env_vars.wf|ml_project_1.test_env_vars.wf>"},"level":"error","msg":"Error when trying to reconcile w │
│ E1016 18:56:14.532983       1 workers.go:102] error syncing 'customer-ds-development/aggb47m8lth9qgr8m4vr': failed at Node[start-node]. CausedByError: Failed to store workflow inputs (as start node), caused by: Failed to write data [0 │
│     status code: 403, request id: W93BW8RFPXS6Y2CF, host id: v0uNuIeisj96rMbYNFL5W1KpIikl6BixP2rEyDOqTHLx0BFmSSt9bAhJewmTYSNPUMuWyNlypS8=                                                                                                  │
│ {"json":{"exec_id":"aggb47m8lth9qgr8m4vr","node":"start-node","ns":"customer-ds-development","res_ver":"991597769","routine":"worker-32","wf":"customer-ds:development:<http://ml_project_1.test_env_vars.wf|ml_project_1.test_env_vars.wf>"},"level":"error","msg":"Failed to wr │
│ {"json":{"exec_id":"aggb47m8lth9qgr8m4vr","node":"start-node","ns":"customer-ds-development","res_ver":"991597769","routine":"worker-32","wf":"customer-ds:development:<http://ml_project_1.test_env_vars.wf|ml_project_1.test_env_vars.wf>"},"level":"error","msg":"Failed to wr │
│ {"json":{"exec_id":"aggb47m8lth9qgr8m4vr","ns":"customer-ds-development","res_ver":"991597769","routine":"worker-32","wf":"customer-ds:development:<http://ml_project_1.test_env_vars.wf|ml_project_1.test_env_vars.wf>"},"level":"error","msg":"Error when trying to reconcile w │
│ E1016 18:56:24.587487       1 workers.go:102] error syncing 'customer-ds-development/aggb47m8lth9qgr8m4vr': failed at Node[start-node]. CausedByError: Failed to store workflow inputs (as start node), caused by: Failed to write data [0 │
│     status code: 403, request id: 29EX1TAB8BQWV0W6, host id: T/Ifpg+KMsXYjyoxjbL4Y4n01aBJXr6JMqwUncTkvimyRbWJmZwO3CqNWNxXPzcdj527sqJSp2yVPz/mJ1d1TN+iJgxAum6O
Note that each host id is different. When submitting the local workflow to remote via pyflyte run --remote, the flyte server is able to write to the same S3 bucket at
Copy code
/metadata/customer-ds/development/f829d772e914540dfa24/
Why cannot the propeller write to /metadata/propeller/customer-ds-development-f829d772e914540dfa24/start-node/data/0/outputs.pb? Could someone shed some light? Thank you very much!
s
Have you double-checked the credentials you specified in the flyte binary or core configmap?
f
Hi @Samhita Alla, Sorry I didn’t see your reply until now. The issue was caused by wrong IAM role config on my end. Thanks