I have a use case that requires very fine-grained ...
# ask-the-community
a
I have a use case that requires very fine-grained caching and was wondering if a dynamic workflow spawning thousands of task is okay? • I have a pandas dataframe of 50k items - each row contains a sentence that I want to apply expensive operations on (think of passing each sentence through an external LLM service) • Across my experiments the order and contents of the dataframe containing sentences can change but I still want to cache hit on the subset of sentences that are already seen (for example I have to shuffle and random split my dataset for validation) Any ideas?
f
Hey! Why not a map task for this?
a
Interesting 🤔 • Do map tasks also support task level caches? • Are they able to support thousands of tasks without blowing up the graph?
f
Don’t know about the second one https://flyte.org/blog/map-tasks-in-flyte . Caching works for those
s
Are they able to support thousands of tasks without blowing up the graph?
I believe so!
a
Just tested it out with 2k nodes and it worked perfectly 👍 gonna try out with 50k today 🤞