brainy-nail-23390
12/10/2024, 4:22 PMgcs
- I have a task that returns 50,000 list[pd.DataFrame]
which gets registered as raw data. The task logic itself takes < 1 min to run locally but remote serialisation takes forever (on the order of 1h30mins) when dealing with this magnitude while using very little CPU itself and little network TX (as per attached). From this section, it seems to imply that the task container itself is doing the publishing to GCS but I went through documentation (& a bit of source code) and couldn't get a quick idea on if this is the case & how to possibly speed it up. Any advice?
If it is a Flyte Propeller bottleneck, I'm aware of these config vars - is it just the case of adding more concurrency via rate, workers etc to FlytePropeller?high-accountant-32689
12/10/2024, 4:53 PMbrainy-nail-23390
12/10/2024, 6:07 PMthankful-minister-83577
thankful-minister-83577
brainy-nail-23390
12/10/2024, 10:26 PM