Hi, I'm encountering the following weird situation...
# ask-the-community
y
Hi, I'm encountering the following weird situation: Create
task1
which is a cached task that returns a dataframe annotated with
Annotated[pd.DataFrame,(hash_pandas_dataframe)]
as suggested in the user guide When running a map task on this I'm always getting a cache miss. After some investigation I realised this is due to the way that the array task name is generated, it includes the
python_interface
of the task which has the following form:
(-> (o0: typing.Annotated[pandas.core.frame.DataFrame, <flytekit.core.hash.HashMethod object at 0x7fd3e5b818d0>])
This is due to the HashMethod initializtion in the task definition. Is there a workaround?
Update: For some reason map_task caching works when using dataframes with annotations However due to the way the hash is generated it means that map tasks and tasks don't share caches
?
d
cc @Paul Dittamo
@Yoni do you mind creating a github issue on this so we can better track?
y
Ok
p
@Yoni This map task caching issue should be resolved by: flyteorg/flytekit#2113 One thing to note, this update will cause for cache misses on already cached map_tasks on first runs as the task names are different. Please re-open the issue and tag me if you notice any further issues.
y
Thank you