flat-waiter-82487
06/27/2025, 7:54 AMFlyteDirectory.download()
? We have an S3 directory with ~200k files, 2-20kB each (which tbh is not THAT large for modern standards) and FlyteDirectory.download()
fails due to:
An error occurred (RequestTimeTooSkewed) when calling the GetObject operation: The difference between the request time and the current time is too large.
gentle-tomato-480
06/27/2025, 10:58 AMgentle-tomato-480
06/27/2025, 10:58 AMflat-waiter-82487
06/27/2025, 10:59 AMFlyteDirectory
?flat-waiter-82487
06/27/2025, 11:03 AMflat-waiter-82487
06/27/2025, 12:51 PM.download
should manage the concurrency better and not allow the RequestTimeTooSkewed
- my guess is that underneath the hood lots of async
coroutines are spawned and some of them timeout somehow.freezing-airport-6809
freezing-airport-6809
freezing-airport-6809
freezing-airport-6809
flat-waiter-82487
06/27/2025, 1:42 PMflat-waiter-82487
06/27/2025, 1:43 PMYou could walk the directory
[3:35 PM]
And parallelize itBut that's manual, the FlyteDirectory abstractions should do it by itself 😄
flat-waiter-82487
06/27/2025, 1:43 PMfreezing-airport-6809
freezing-airport-6809
flat-waiter-82487
06/27/2025, 1:58 PMflat-waiter-82487
06/27/2025, 2:00 PMloop_manager
thing and tbh I don't know when "await" is happening - are all files discovered during "walk" scheduled at once and then just awaited? Is there a control of the concurrency there?)famous-branch-28985
06/27/2025, 6:06 PMfamous-branch-28985
06/27/2025, 6:08 PMfamous-branch-28985
06/27/2025, 6:10 PMfreezing-airport-6809
flat-waiter-82487
07/09/2025, 7:07 AMTAR
the files and just download 1 file instead of 200k