I have a use case that requires very fine-grained caching and was wondering if a dynamic workflow spawning thousands of task is okay?
• I have a pandas dataframe of 50k items - each row contains a sentence that I want to apply expensive operations on (think of passing each sentence through an external LLM service)
• Across my experiments the order and contents of the dataframe containing sentences can change but I still want to cache hit on the subset of sentences that are already seen (for example I have to shuffle and random split my dataset for validation)
Any ideas?