Flyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.

Flyte

for map task caching, does the entire map task have to finish before the task caches are written? Or does it happen as each task is finished.

does it only cache successful tasks or also failed

&gt; Happens at each sub task
Just want to be accurate here. maptask caching happens at the subtask level when the entire maptask completes. For example, if the input is `[0,1,2]` and the subtasks for `0` and `1` are successful but `1` fails, once every task is complete the outputs for `0` and `1` will be cached.

In ArrayNode, the new experimental maptask implementation, each subtask will be cached as it completes. So in the above example, when subtask `0` completes it will be cached independently of when `1` completes.

you mean if 2 fails? ok, so all the tasks need to finish to cache.

what if you were to run a map_task with inputs [0. 1] and then again with [2]. those finish and 0, 1 are cached. If you run now with [0,1,2], are 0, 1 pulled from a cache?

t1: [0,1] start
t2: [2] start
t3: [0,1] finish (and cache)
t4: [0,1,2] start

IIUC this represents what you're asking? Then yes, at t4 0 and 1 should be read from cache. If at t3.5 the 2 task finishes and is cached then at t4 all the items should be read from cache.

&gt;  you mean if 2 fails?
yes, exactly - thanks for catching this!

what happens if it aborts the map task and its like 30% done? either manually or too many errors

It looks like aborting will mean nothing gets written to cache. So manually aborting, or aborting the maptask because some other task in the workflow failed. The maptask has to complete (either succeed or failure) for anything to be cached.

Again, with ArrayNode subtasks are cached immediately on completion. So in the event of an abort, everything that already completed would be cached.