Brian Tang
04/28/2022, 9:50 PMcache_version
, task signature, or inputs change. This aligns with the example provided:
In the above example, calling square(n=2) twice (even if it’s across different executions or different workflows) will only execute the multiplication operation once.• However, in my testing — caching a task and then making a change to a different task in the workflow causes the “cached” task to be recomputed. The code shows the hashed key contains a
core.Identifier
that has the task version in it. Logs are also showing the same, which would mean each iteration on the workflow would recompute the “cached” task:
"Successfully cached results to catalog - Task [resource_type:TASK project:\"fraud-intelligence\" domain:\"development\" name:\"src.python.flyte.fraud_intelligence.training_set.main.filter_and_label\" version:\"e8758db3b7a1e6fffbb0bb73c742310acd7e774f\" ]"
Are the docs just outdated? If the implementation is correct, how are cached tasks expected to be reused when iterating on a workflow? Is the only way to reuse a cached task through a reference task?Yee
04/28/2022, 9:57 PMBrian Tang
04/28/2022, 9:59 PM@task(cache=True, cache_version="v1")
def filter_and_label(
user: str,
snapshot_date: str,
start_date: str,
end_date: str,
label_type: str,
final_features_path: str,
output_path: str,
) -> str:
Yee
04/28/2022, 10:04 PMBrian Tang
04/28/2022, 10:05 PMYee
04/28/2022, 10:07 PMdiscoveryVersion
is what you expect in all cases?Brian Tang
04/28/2022, 10:09 PM$ flytectl get tasks -d development -p fraud-intelligence src.python.flyte.fraud_intelligence.training_set.main.filter_and_label -oyaml
- closure:
compiledTask:
template:
...
metadata:
discoverable: true
discoveryVersion: v1
Yee
04/28/2022, 10:11 PMBrian Tang
04/28/2022, 10:11 PMflyte_cached-goqzg39XfX_GSwutxjbTzJghG38yCEerd52cCCV6zzA
and running an updated workflow is showing
"DataCatalog failed to get artifact by tag flyte_cached-2zsk_u8ljbfKgGhocfVu4Cmm6aHxU8YfO63yFx92duk"
Yee
04/28/2022, 10:12 PMcore.Identifier
you were talking about?Brian Tang
04/28/2022, 10:13 PMcatalog.Key
"type": "python-task",
"metadata": {
"discoverable": true,
"runtime": {
"type": 1,
"version": "0.26.0",
"flavor": "python"
},
"retries": {},
"discoveryVersion": "v1"
},
looks consistent between both of themDan Rammer (hamersaw)
04/28/2022, 10:40 PMBrian Tang
04/28/2022, 10:42 PMend_date:
2022-04-26
snapshot_date:
2022-04-26
start_date:
2022-04-25
user:
btang
Dan Rammer (hamersaw)
04/28/2022, 10:57 PMKetan (kumare3)
04/29/2022, 1:08 AMBrian Tang
04/29/2022, 4:22 PM"DataCatalog failed to get dataset for ID resource_type:TASK project:\"fraud-intelligence\" domain:\"development\" name:\"src.python.flyte.fraud_intelligence.training_set.main.filter_and_label\" version:\"e8758db3b7a1e6fffbb0bb73c742310acd7e774f\"
at first glance made me believe those were all a part of the cache key