Hey I have a Flyte deployment configured to use the remote s Flyte #flyte-support

Hey! I have a Flyte deployment configured to use t...

brainy-nail-23390

12/10/2024, 4:22 PM

Hey! I have a Flyte deployment configured to use the remote storage via

gcs

- I have a task that returns 50,000

list[pd.DataFrame]

which gets registered as raw data. The task logic itself takes < 1 min to run locally but remote serialisation takes forever (on the order of 1h30mins) when dealing with this magnitude while using very little CPU itself and little network TX (as per attached). From this section, it seems to imply that the task container itself is doing the publishing to GCS but I went through documentation (& a bit of source code) and couldn't get a quick idea on if this is the case & how to possibly speed it up. Any advice? If it is a Flyte Propeller bottleneck, I'm aware of these config vars - is it just the case of adding more concurrency via rate, workers etc to FlytePropeller?

high-accountant-32689

12/10/2024, 4:53 PM

@brainy-nail-23390, can you confirm which version of flytekit you're running? In flytekit 1.14.0 we shipped an improvement to the type engine that affects this exact case (i.e. a task that returns a list of offloaded values). cc: @thankful-minister-83577

brainy-nail-23390

12/10/2024, 6:07 PM

Hey @high-accountant-32689, oh cool - we're using Flytekit 1.13.8 atm - would using flytekit 1.14.0 with a helm chart @ 1.13.2 be sufficient? Am I correct in understanding that if it's just a flytekit update then it's a task container bottleneck vs a Flyte Propeller one?

thankful-minister-83577

12/10/2024, 10:15 PM

if you’re upgrading to 1.14.0 in flytekit please upgrade to 1.14 on the backend as well (otherwise dataclasses will not work - there’s a flag to get around it, but better to just upgrade.)

thankful-minister-83577

12/10/2024, 10:15 PM

this will run through that list in relative parallel, but it’ll still be 50k different uploads (they’ll just be happening concurrently).

brainy-nail-23390

12/10/2024, 10:26 PM

Alright thanks @thankful-minister-83577, will have a stab at doing this tomorrow

Open in Slack

Previous Next