Howdy 👋.
Question around pandas dataframe outputs/inputs and
map_task
. Sometimes we have many many outputs from a previous task as inputs to
map_task
, which we have found to be slow on occasion and I think we sometimes see limitations in max number of inputs/outputs (is there a limit?). I have been enjoying using dataframes as outputs/inputs for tasks and was wondering if it would ever make sense to add dataframe input/output support for
map_task
?
Followup thought: would it ever make sense to create batch support for
map_task
? For example, a batch size of 100 would mean that a single pod would stay up and iterate over 100 input elements. I suppose this can already easily be accomplished by constructing the tasks/inputs accordingly.
t
tall-lock-23197
11/03/2023, 7:16 AM
> is there a limit?
Yes, 5000 should work. If you go beyond this, it might not scale very well. In that case, it'd be a wise approach to adopt hierarchical map tasks (you can use a dynamic workflow to create multiple map tasks.)
> if it would ever make sense to add dataframe input/output support for
map_task
Do you mean a list of dataframe inputs?
> For example, a batch size of 100 would mean that a single pod would stay up and iterate over 100 input elements.
You should be able to accomplish this within a single task, just by looping over the batch.
👍 1
b
bored-beard-89967
11/03/2023, 12:47 PM
I think a list of dataframes works at the moment no?
bored-beard-89967
11/03/2023, 12:49 PM
I mean using a row in a df as an input. So, instead of providing a list of inputs it would be a single df. The number of tasks would equal the number of rows.
t
tall-lock-23197
11/03/2023, 12:58 PM
I think a list of dataframes works at the moment no?
It has to, yes.
I mean using a row in a df as an input.
Gotcha. I'm not sure if that's something we can consider as a high priority item. If you're willing to contribute, please feel free to create an issue, and the team will let you know what they think of it.
b
bored-beard-89967
11/03/2023, 1:03 PM
Yeah, definitely not a high priority item. Great. I'll consider creating an issue.