square-boots-41503
04/10/2023, 2:52 PMAnnotated HashMethod to the input types of a task instead of the output. Then, the HashMethod can be run on the inputs when a cached task is invoked to determine the cache keys and then check if there's a hit.
I tried this out and it kind of works for local caching? but not for remote.
@task(cache=True, cache_version="1.0")
def my_task(obj: ty.Annotated[pd.DataFrame, HashMethod(hash_df)]) -> pd.DataFrame:
redis_client.incr(REDIS_COUNTER_KEY)
return obj
@dynamic
def my_workflow():
obj = pd.DataFrame(
{
"name": ["a", "b"],
"val": ["test1", "test2"],
}
)
obj = my_task(obj=obj)
obj = my_task(obj=obj)
my_task(obj=obj)
In the example above my_task will be be called three times the first time my_workflow is called. This still doesn't match my expectation since I thought it would be called once for the first call and for the cache to be hit on the second and third call since the input obj is the same.
However! The second run of my_workflow has a cache hit for all three calls to my_task so it does work to some extent even though I don’t fully understand what it’s doing.
For remote caching, this doesn’t seem to work at all and there are no cache hits no matter how many times I run my_workflow.thankful-minister-83577
thankful-minister-83577
thankful-minister-83577
thankful-minister-83577
@task
t1() -> T
@task(cache=True, cache_version="1")
def t2(a: T)
t2(a=t1())thankful-minister-83577
thankful-minister-83577
square-boots-41503
04/10/2023, 7:40 PMobj is initially created through a pandas.df constructor. The first call to my_task would never get a cache hit?thankful-minister-83577
thankful-minister-83577
thankful-minister-83577