Is there a way to speedup the `FlyteDirectory.down...
# flyte-support
f
Is there a way to speedup the
FlyteDirectory.download()
? We have an S3 directory with ~200k files, 2-20kB each (which tbh is not THAT large for modern standards) and
FlyteDirectory.download()
fails due to:
Copy code
An error occurred (RequestTimeTooSkewed) when calling the GetObject operation: The difference between the request time and the current time is too large.
g
You could stream it
Not sure if it'd speed things up, but probably would help you avoid the error
f
Can you give a good example of that with
FlyteDirectory
?
f
Streaming seems to be for large files > But for large files, you can iterate through the contents of the stream
Regardles, the
.download
should manage the concurrency better and not allow the
RequestTimeTooSkewed
- my guess is that underneath the hood lots of
async
coroutines are spawned and some of them timeout somehow.
f
You could walk the directory
And parallelize it
Request time skew is indeed weird. Cc @thankful-minister-83577 have you seen this.
@flat-waiter-82487 I would have loved to give you v2, as the system has so much better control over io.
f
How hard the migration from V1 to V2 will be?
You could walk the directory
[3:35 PM]
And parallelize it
But that's manual, the FlyteDirectory abstractions should do it by itself 😄
If I have to walk the dir, I don't need FlyteDirectory at all
f
Haha, Flyte directory has walk
It’s also using fsspec underneath
f
Yeah, but this walk is implemented in a way that timeouts during download
(it's not the walk, it's more likely something underneath that downloads - I've seen some async mumbo jumbo underneath and I decided not to spend much time unrolling it - there's this
loop_manager
thing and tbh I don't know when "await" is happening - are all files discovered during "walk" scheduled at once and then just awaited? Is there a control of the concurrency there?)
f
We use this to download 70k+ files just fine
❤️ 2
I have come across that error before tho 😬
😬 1
f
@flat-waiter-82487 did that get fixed?
f
No, I've decided to
TAR
the files and just download 1 file instead of 200k