hello, does anyone know why i am seeing the `cache...
# ask-the-community
a
hello, does anyone know why i am seeing the
cache was disabled for this task
for a simple python task, when I have the cache enabled:
Copy code
@task(cache=True, cache_version="1.0")
def simple_python_task(name: str):
    print(f"Hello {name}")
I see this error in datacatalog:
Copy code
{
  "json": {},
  "level": "warning",
  "msg": "Dataset does not exist key: {Project:flytetester Name:flyte_task-simple_python_task Domain:development Version:1.0-Y1uUT6Xg-GKw-c0Pw UUID:}, err missing entity of type Dataset with identifier project:\"flytetester\" name:\"flyte_task-.simple_python_task\" domain:\"development\" version:\"1.0-Y1uUT6Xg-GKw-c0Pw\" ",
  "ts": "2023-02-17T19:01:01Z"
}
The cache is not being written at all.. would appreciate any pointers on how to debug
Looks like a task must have an output to be cached?
d
Looks like a task must have an output to be cached?
That is correct.
a
Thanks! another question, does "fast registration" has any impact on the cache? like technically a code change in the fast registration could also invalidate the cache
or do we rely on cache_version being updated for that
d
We rely on the
cache_version
currently. There has been different discussions on using a hash of the function contents to define caching so we could automatically determine changes in the function. But frequently enough, users want to fix a minor bug in the function without invalidating previously cached data.
a
that makes sense..! i was actually hoping for that to be the case :)
automatically determine is going to be a complex problem and user probably has more context on what invalidates the cache
d
agreed, a community member also recently contributed to enable cache overwrites and cache deletions for a workflow execution. So I think we have most of the scenarios covered đŸ˜…
a
Nice!
we were also going to look into building something for explicitly purging cache.. but seems like we already have something like that available
d
take a look at https://github.com/flyteorg/flyte/issues/2867. the cache delete functionality is not fully merged yet, but coming very soon!
There's not auto-purging, but there should be an endpoint on admin to delete the cache for a workflow or node execution.
a
also some kind of possibility of cache invalidation.. like if we can validate whether the output location actually exists in the underlying storage could be useful
d
Yeah, that could be powerful. There have been a few discussions about cache expirations too - nothing being worked on yet but may be nice to get on the roadmap.
a
I was wondering if we should throw some kind of warning during compilation if cache directive has no impact on the task with no output
d
that sounds like a great idea to me! i think it would have to come from flytekit. cc @Yee @Eduardo Apolinario (eapolinario) thoughts on warning if a user defines a task cachable with no outputs (if there are no outputs propeller disables caching).
y
yeah that’s something we can add for sure.
a
i can send a pr
y
oh beautiful
thank you!
154 Views