https://flyte.org logo
#ask-the-community
Title
# ask-the-community
y

Yoni

01/13/2024, 11:20 AM
Hi, I'm encountering the following weird situation: Create
task1
which is a cached task that returns a dataframe annotated with
Annotated[pd.DataFrame,(hash_pandas_dataframe)]
as suggested in the user guide When running a map task on this I'm always getting a cache miss. After some investigation I realised this is due to the way that the array task name is generated, it includes the
python_interface
of the task which has the following form:
(-> (o0: typing.Annotated[pandas.core.frame.DataFrame, <flytekit.core.hash.HashMethod object at 0x7fd3e5b818d0>])
This is due to the HashMethod initializtion in the task definition. Is there a workaround?
Update: For some reason map_task caching works when using dataframes with annotations However due to the way the hash is generated it means that map tasks and tasks don't share caches
?
d

Dan Rammer (hamersaw)

01/16/2024, 1:21 PM
cc @Paul Dittamo
@Yoni do you mind creating a github issue on this so we can better track?
y

Yoni

01/16/2024, 3:37 PM
Ok
p

Paul Dittamo

01/22/2024, 8:06 PM
@Yoni This map task caching issue should be resolved by: flyteorg/flytekit#2113 One thing to note, this update will cause for cache misses on already cached map_tasks on first runs as the task names are different. Please re-open the issue and tag me if you notice any further issues.
y

Yoni

01/23/2024, 5:32 PM
Thank you