square-boots-41503
04/10/2023, 2:52 PMAnnotated HashMethod
to the input types of a task instead of the output. Then, the HashMethod
can be run on the inputs when a cached task is invoked to determine the cache keys and then check if there's a hit.
I tried this out and it kind of works for local caching? but not for remote.
@task(cache=True, cache_version="1.0")
def my_task(obj: ty.Annotated[pd.DataFrame, HashMethod(hash_df)]) -> pd.DataFrame:
redis_client.incr(REDIS_COUNTER_KEY)
return obj
@dynamic
def my_workflow():
obj = pd.DataFrame(
{
"name": ["a", "b"],
"val": ["test1", "test2"],
}
)
obj = my_task(obj=obj)
obj = my_task(obj=obj)
my_task(obj=obj)
In the example above my_task
will be be called three times the first time my_workflow
is called. This still doesn't match my expectation since I thought it would be called once for the first call and for the cache to be hit on the second and third call since the input obj
is the same.
However! The second run of my_workflow
has a cache hit for all three calls to my_task
so it does work to some extent even though I don’t fully understand what it’s doing.
For remote caching, this doesn’t seem to work at all and there are no cache hits no matter how many times I run my_workflow
.thankful-minister-83577
thankful-minister-83577
thankful-minister-83577
thankful-minister-83577
@task
t1() -> T
@task(cache=True, cache_version="1")
def t2(a: T)
t2(a=t1())
thankful-minister-83577
thankful-minister-83577
square-boots-41503
04/10/2023, 7:40 PMobj
is initially created through a pandas.df constructor. The first call to my_task
would never get a cache hit?thankful-minister-83577
thankful-minister-83577
thankful-minister-83577