Howdy :wave:. Question around pandas dataframe ou...
# ask-the-community
l
Howdy 👋. Question around pandas dataframe outputs/inputs and
map_task
. Sometimes we have many many outputs from a previous task as inputs to
map_task
, which we have found to be slow on occasion and I think we sometimes see limitations in max number of inputs/outputs (is there a limit?). I have been enjoying using dataframes as outputs/inputs for tasks and was wondering if it would ever make sense to add dataframe input/output support for
map_task
? Followup thought: would it ever make sense to create batch support for
map_task
? For example, a batch size of 100 would mean that a single pod would stay up and iterate over 100 input elements. I suppose this can already easily be accomplished by constructing the tasks/inputs accordingly.
s
> is there a limit? Yes, 5000 should work. If you go beyond this, it might not scale very well. In that case, it'd be a wise approach to adopt hierarchical map tasks (you can use a dynamic workflow to create multiple map tasks.) > if it would ever make sense to add dataframe input/output support for
map_task
Do you mean a list of dataframe inputs? > For example, a batch size of 100 would mean that a single pod would stay up and iterate over 100 input elements. You should be able to accomplish this within a single task, just by looping over the batch.
l
I think a list of dataframes works at the moment no?
I mean using a row in a df as an input. So, instead of providing a list of inputs it would be a single df. The number of tasks would equal the number of rows.
s
I think a list of dataframes works at the moment no?
It has to, yes.
I mean using a row in a df as an input.
Gotcha. I'm not sure if that's something we can consider as a high priority item. If you're willing to contribute, please feel free to create an issue, and the team will let you know what they think of it.
l
Yeah, definitely not a high priority item. Great. I'll consider creating an issue.