for map task caching, does the entire map task hav...
# ask-the-community
l
for map task caching, does the entire map task have to finish before the task caches are written? Or does it happen as each task is finished.
k
Happens at each sub task
l
does it only cache successful tasks or also failed
k
Successful only
d
Happens at each sub task
Just want to be accurate here. maptask caching happens at the subtask level when the entire maptask completes. For example, if the input is
[0,1,2]
and the subtasks for
0
and
1
are successful but
1
fails, once every task is complete the outputs for
0
and
1
will be cached.
In ArrayNode, the new experimental maptask implementation, each subtask will be cached as it completes. So in the above example, when subtask
0
completes it will be cached independently of when
1
completes.
l
you mean if 2 fails? ok, so all the tasks need to finish to cache. what if you were to run a map_task with inputs [0. 1] and then again with [2]. those finish and 0, 1 are cached. If you run now with [0,1,2], are 0, 1 pulled from a cache?
d
t1: [0,1] start t2: [2] start t3: [0,1] finish (and cache) t4: [0,1,2] start IIUC this represents what you're asking? Then yes, at t4 0 and 1 should be read from cache. If at t3.5 the 2 task finishes and is cached then at t4 all the items should be read from cache.
you mean if 2 fails?
yes, exactly - thanks for catching this!
l
ok nice. thanks for the clarifications!
what happens if it aborts the map task and its like 30% done? either manually or too many errors
does the 30% get written to cache
d
It looks like aborting will mean nothing gets written to cache. So manually aborting, or aborting the maptask because some other task in the workflow failed. The maptask has to complete (either succeed or failure) for anything to be cached.
Again, with ArrayNode subtasks are cached immediately on completion. So in the event of an abort, everything that already completed would be cached.