https://flyte.org logo
#flytekit
Title
# flytekit
p

Pradithya Aria Pura

05/16/2023, 6:02 AM
In the older version of flytekit (pre 1.5) we can configure gsutil parallelism using this configuration. Is it correct to assume that I can set the value by setting environment variable
FLYTE_GCS_GSUTIL_PARALLELISM
to
true
?
Asking since we see a slow
FlyteDirectory
transfer when there are a lot of files ~10k and just realized the parallelism is disabled by default. What will be the best approach to enable it globally? cc: @Lee Ning Jie Leon
k

Ketan (kumare3)

05/16/2023, 2:12 PM
Ohh now we use fsspec, I would have expected it to be faster. Cc @Yee / @jeev
j

jeev

05/16/2023, 3:14 PM
@Pradithya Aria Pura what flytekit version?
p

Pradithya Aria Pura

05/16/2023, 3:15 PM
I am using 1.2.11, still stuck with < 1.3 due to protobuf version
j

jeev

05/16/2023, 3:16 PM
FLYTE_GCP_GSUTIL_PARALLELISM
p

Pradithya Aria Pura

05/16/2023, 3:20 PM
Haven’t tested it yet. Still figuring out the approach to enable it globally so that all workflow will get the benefit. Any suggestion?
j

jeev

05/16/2023, 3:21 PM
what about default env var in propeller plugin config?
p

Pradithya Aria Pura

05/16/2023, 3:24 PM
Do you mean this ?
j

jeev

05/16/2023, 3:24 PM
yes!
p

Pradithya Aria Pura

05/16/2023, 3:25 PM
Got it, thanks! Will try and update in this thread!
k

Ketan (kumare3)

05/16/2023, 3:25 PM
So after flytekit 1.5 you should not need it
p

Pradithya Aria Pura

05/16/2023, 3:26 PM
yeah, hopefully we can reach to that point asap 🤞
k

Ketan (kumare3)

05/16/2023, 3:37 PM
Also let us know about 1.5 and how it’s working etc
p

Pradithya Aria Pura

05/17/2023, 3:22 AM
It works! And it scale with the number of CPU too. Previously it tooks ~17minutes to copy 13k images, now it’s around ~5minutes with 4 CPU cores.
j

jeev

05/17/2023, 3:27 AM
you can further tune it too if you can mount a boto.cfg into the container: https://medium.com/@duhroach/gcs-read-performance-of-large-files-bd53cfca4410
k

Ketan (kumare3)

05/17/2023, 3:27 AM
In the new version @Pradithya Aria Pura I would recommend using the streaming api
The same method can be accelerate as it will use very little disk and memory
p

Pradithya Aria Pura

05/17/2023, 3:52 AM
In the new version @Pradithya Aria Pura I would recommend using the streaming api
Noted will keep this in mind. Thanks @jeev this is really useful!
2 Views