*Caching & workflow version* Hey guys, I have ...
# ask-the-community
Caching & workflow version Hey guys, I have a question about how the caching works, regarding to workflow versions. Looking at the doc here, I originally thought that modifying the tasks in a workflow wouldn't invalidate the cache of upstream tasks. E.g: ā€¢ I run the workflow:
n1 -> n2
ā€¢ Cache is created for
. Great šŸ‘ ā€¢ I add a task to the workflow:
n1 -> n2 -> n3
ā€¢ I expected
to be cached, but they are actually run again. šŸ¤” Is it the expected behaviour (i.e. cache is managed "per task per workflow version", and not only "per task")? Or am I doing something wrong with my code ?
It should be per task
It seems your task have or input, or its cache version changed
Hmmm ok, that's strange.
Share the snippet, I do this all the time
Hmmmm to be honest it's a bit too big to be shared, I made a complex cache computation function based on the call graph (using a lib called pycg).
I have ~150 lines of code to build the cache key šŸ˜•
Wow is it even identical
Haaaaaaaaa I think I found why. I get all the function dependencies with
, then try to import them and get their source with
to compute a hash. But when instanciated by flytekit, my code is not able to import the task. (e.g: if my task
is located in the
file, the code will try
import <http://my_workflow.my|my_workflow.my>_task
, get the source and hash it, but somehow, inside the flytekit import system it doesn't work anymore. As a fallback, my code was taking the source of the whole file (i.e.
import my_workflow
), that's the workflow version was indeed included in the cache key of every task. The solution I found is to call this "import and hash" function as a subprocess, outside of any flytekit interference.
If you'd like to play with the code, here it is.
(quite hacky and lacks proper documentation, but it fulfills my need)