Hey everyone, Related issues (cc <@U084PRJUXEV>):...
# flyte-support
g
Hey everyone, Related issues (cc @freezing-tailor-85994): https://flyte-org.slack.com/archives/CP2HDHKE1/p1740685806604829 I'm getting another AWS error when executing remote pipelines, albeit slightly different error message than the issue above. I'm just trying to run the
hello_world_wf
from flytesnacks.
Copy code
[1/1] currentAttempt done. Last Error: USER::
[aksmph8kg7gwsk2tsphb-n0-0] terminated with exit code (1). Reason [Error]. Message: 
b/python3.11/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/flytekit/bin/entrypoint.py", line 736, in fast_execute_task_cmd
    _download_distribution(additional_distribution, dest_dir)
  File "/usr/local/lib/python3.11/site-packages/flytekit/core/utils.py", line 312, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/flytekit/tools/fast_registration.py", line 310, in download_distribution
    FlyteContextManager.current_context().file_access.get_data(
  File "/usr/local/lib/python3.11/site-packages/flytekit/utils/asyn.py", line 113, in wrapped
    return self.run_sync(coro_func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/flytekit/utils/asyn.py", line 106, in run_sync
    return self._runner_map[name].run(coro)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/flytekit/utils/asyn.py", line 85, in run
    res = fut.result(None)
          ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.11/site-packages/flytekit/core/data_persistence.py", line 628, in async_get_data
    raise FlyteDownloadDataException(
flytekit.exceptions.system.FlyteDownloadDataException: SYSTEM:DownloadDataError: error=Failed to get data from <s3://XXXXXXXXXXXXXXX-bucket-YYYYYYYY/flytesnacks/development/77QTUUR4E7PXXXXXXXXX======/fastf41cbd66eae638b391385bc9d6c30e48.tar.gz> to ./ (recursive=False).

Original exception: [Errno 5] An error occurred (ValidationError) when calling the AssumeRoleWithWebIdentity operation: Request ARN is invalid
As suggested by the thread above, I have tried pinning
botocore==1.35.23
in the
customizing_dependencies/image_spec.py
example with no luck. This is confusing because I deployed on Feb 10, and ran some workflows then. Nothing has changed except that a month has passed, and now I am getting this error. So although it's possible it's a deployment issue, I haven't changed anything. I am unable to rerun tasks that previously succeeded, as they get the same error. Any tips of where to look for issues in the code or which components of the stack to test would be greatly appreciated. Thank you Flyte version: 1.14.1 Flyte deployment: AWS EKS, but a hacky version of because I have to use AWS CDK (no terraform or manual deployment). Details are in this thread (thank you @average-finland-92144). If my deployment might be the issue, I'm happy to tidy up this code and make a PR to the Flyte the hard way repo so we can get some fully-working flyte CDK code (this was the plan as I think CDK users would find this useful and it would bring more people to flyte). The main difference is that I am using the default service account everywhere, as created by the helm chart, and have not created separate flyte worker roles. As I say, this deployment was once working so this isn't my immediate place to look for an issue. But, if there's some obvious role/permission that this error is pointing towards, then I can update the CDK and redeploy.
f
You have to pin boto 1.36 since there was a breaking change in 1.37 around async handling