I have a question related to automatic downloads o...
# ask-the-community
a
I have a question related to automatic downloads of FlyteFiles/FlyteFolders. I have a large amount of data (~1TB) spread across ~15,000 files stored in GCS. I have a Flyte workflow that, as a first step, downloads these files to be operated on locally. I noticed that the download speed is around 20 MB/s and uses about 1-2 CPU cores. I did some testing, and when I manually invoke
gcloud storage cp -r …
inside of a Flyte workflow, I am able to download the same exact files at approximately 1.7GB/s, and I see usage of ~20 CPU cores (the workflow requests 32 cpu cores). I observe the same behavior when uploading files to GCS as well. How can I still make use of FlyteFiles/FlyteFolders and the automatic download/upload functionality, but get faster download/upload speed?
y
which version of flytekit are you on?
a
I’m not sure… how do I find out?
y
just
pip show flytekit
a
I think 1.2.7
y
would you be able to bump to a newer version and try again?
we’ve actually outsourced this entirely
in 1.5 it’s run by fsspec courtesy of gcsfs
a
Yeah I saw that in the docs, I don’t have the ability to upgrade though, that decision is made outside my team
Hm, in some projects we’re using 1.4.2, but I guess that’s also not new enough?
k
cc @jeev 😄
j
lol
y
can you bump to 1.5 please?
1.4.2 should have been… assuming you were using the flytekit default image.
182 Views