strong-quill-48244
12/01/2022, 6:37 PMmap_task
to map the chunks to equally many worker nodes. However, now I would like to add another task into the mix. The task would be a ShellTask, which would prefetch data into Flyte filesystem for the worker nodes to use in processing. Reasoning behind this is that currently the fetching of the data happens inside of the processing loop, which creates a sizeable I/O bottleneck. The problem is that according to the docs, one should not call another task from inside a mapped task. So I’m looking for a more flexible approach to distribute processing to multiple pods, which would allow calling tasks from inside the worker nodes. I’ve looked into @dynamic
and subworkflows. Which would be better, or is there a better option? Thanks a tonstrong-quill-48244
12/01/2022, 6:38 PMWhen defining a map task, avoid calling other tasks in it. Flyte can't accurately register tasks that call other tasks. While Flyte will correctly execute a task that calls other tasks, it will not be able to give full performance advantages. This is especially true for map tasks.
magnificent-teacher-86590
12/01/2022, 7:23 PMstrong-quill-48244
12/01/2022, 7:55 PMfreezing-airport-6809
map(a -> b)
where a and b are individual tasks
freezing-airport-6809
freezing-airport-6809
freezing-airport-6809
freezing-airport-6809
magnificent-teacher-86590
12/01/2022, 8:26 PMstrong-quill-48244
12/01/2022, 9:00 PMstrong-quill-48244
12/01/2022, 9:03 PMstrong-quill-48244
12/01/2022, 9:05 PMmagnificent-teacher-86590
12/01/2022, 9:06 PMmagnificent-teacher-86590
12/01/2022, 9:06 PM