https://flyte.org logo
#ask-the-community
Title
# ask-the-community
a

Archit Rathore

09/06/2023, 1:42 PM
I have a use case that requires very fine-grained caching and was wondering if a dynamic workflow spawning thousands of task is okay? • I have a pandas dataframe of 50k items - each row contains a sentence that I want to apply expensive operations on (think of passing each sentence through an external LLM service) • Across my experiments the order and contents of the dataframe containing sentences can change but I still want to cache hit on the subset of sentences that are already seen (for example I have to shuffle and random split my dataset for validation) Any ideas?
f

Franco Bocci

09/06/2023, 1:46 PM
Hey! Why not a map task for this?
a

Archit Rathore

09/06/2023, 1:50 PM
Interesting 🤔 • Do map tasks also support task level caches? • Are they able to support thousands of tasks without blowing up the graph?
f

Franco Bocci

09/06/2023, 2:51 PM
Don’t know about the second one https://flyte.org/blog/map-tasks-in-flyte . Caching works for those
s

Samhita Alla

09/07/2023, 6:34 AM
Are they able to support thousands of tasks without blowing up the graph?
I believe so!
a

Archit Rathore

09/07/2023, 1:13 PM
Just tested it out with 2k nodes and it worked perfectly 👍 gonna try out with 50k today 🤞