Ekku Jokinen
12/01/2022, 6:37 PMmap_task
to map the chunks to equally many worker nodes. However, now I would like to add another task into the mix. The task would be a ShellTask, which would prefetch data into Flyte filesystem for the worker nodes to use in processing. Reasoning behind this is that currently the fetching of the data happens inside of the processing loop, which creates a sizeable I/O bottleneck. The problem is that according to the docs, one should not call another task from inside a mapped task. So I’m looking for a more flexible approach to distribute processing to multiple pods, which would allow calling tasks from inside the worker nodes. I’ve looked into @dynamic
and subworkflows. Which would be better, or is there a better option? Thanks a tonWhen defining a map task, avoid calling other tasks in it. Flyte can't accurately register tasks that call other tasks. While Flyte will correctly execute a task that calls other tasks, it will not be able to give full performance advantages. This is especially true for map tasks.
Jay Ganbat
12/01/2022, 7:23 PMEkku Jokinen
12/01/2022, 7:55 PMKetan (kumare3)
map(a -> b)
where a and b are individual tasks
Jay Ganbat
12/01/2022, 8:26 PMEkku Jokinen
12/01/2022, 9:00 PMJay Ganbat
12/01/2022, 9:06 PM