Hello everyone 👋
I have a question about the map_task. I want to use the map_task with a task that gives me a list of dataframes as output. That means that the result of the map_task will be a list of lists, and I want to flatten it into a single list. To avoid memory problems, I don't want to do the flattening in a single task due to the large amount of data. Any suggestions? Is it possible in another way? Thanks you in advance 🙏
Example code:
if its hard to flatten in a single task, is it possible to safely return them in the map task at all?
i think it might be better to write a pickled file to some where on the cloud and then write a merge-sort style dynamic task that merges then 🤔
t
thankful-minister-83577
08/31/2023, 7:04 PM
if these are all dataframes, then they should be offloaded to s3…
thankful-minister-83577
08/31/2023, 7:04 PM
if you’re just flattening them, and not reading their contents, it should not matter.
thankful-minister-83577
08/31/2023, 7:04 PM
(if it does then that’s a bug in flytekit that should be fixed)
p
prehistoric-carpenter-16793
08/31/2023, 7:28 PM
oh indeed they are just references to a bucket! I wrote my own task that flattens 🙂 Thanks 🙏