https://flyte.org logo
#ask-the-community
Title
# ask-the-community
a

Ankit Goyal

02/17/2023, 7:12 PM
hello, does anyone know why i am seeing the
cache was disabled for this task
for a simple python task, when I have the cache enabled:
Copy code
@task(cache=True, cache_version="1.0")
def simple_python_task(name: str):
    print(f"Hello {name}")
I see this error in datacatalog:
Copy code
{
  "json": {},
  "level": "warning",
  "msg": "Dataset does not exist key: {Project:flytetester Name:flyte_task-simple_python_task Domain:development Version:1.0-Y1uUT6Xg-GKw-c0Pw UUID:}, err missing entity of type Dataset with identifier project:\"flytetester\" name:\"flyte_task-.simple_python_task\" domain:\"development\" version:\"1.0-Y1uUT6Xg-GKw-c0Pw\" ",
  "ts": "2023-02-17T19:01:01Z"
}
The cache is not being written at all.. would appreciate any pointers on how to debug
Looks like a task must have an output to be cached?
d

Dan Rammer (hamersaw)

02/17/2023, 7:52 PM
Looks like a task must have an output to be cached?
That is correct.
a

Ankit Goyal

02/17/2023, 7:52 PM
Thanks! another question, does "fast registration" has any impact on the cache? like technically a code change in the fast registration could also invalidate the cache
or do we rely on cache_version being updated for that
d

Dan Rammer (hamersaw)

02/17/2023, 7:54 PM
We rely on the
cache_version
currently. There has been different discussions on using a hash of the function contents to define caching so we could automatically determine changes in the function. But frequently enough, users want to fix a minor bug in the function without invalidating previously cached data.
a

Ankit Goyal

02/17/2023, 7:55 PM
that makes sense..! i was actually hoping for that to be the case :)
automatically determine is going to be a complex problem and user probably has more context on what invalidates the cache
d

Dan Rammer (hamersaw)

02/17/2023, 7:57 PM
agreed, a community member also recently contributed to enable cache overwrites and cache deletions for a workflow execution. So I think we have most of the scenarios covered đŸ˜…
a

Ankit Goyal

02/17/2023, 7:57 PM
Nice!
we were also going to look into building something for explicitly purging cache.. but seems like we already have something like that available
d

Dan Rammer (hamersaw)

02/17/2023, 7:59 PM
take a look at https://github.com/flyteorg/flyte/issues/2867. the cache delete functionality is not fully merged yet, but coming very soon!
There's not auto-purging, but there should be an endpoint on admin to delete the cache for a workflow or node execution.
a

Ankit Goyal

02/17/2023, 8:01 PM
also some kind of possibility of cache invalidation.. like if we can validate whether the output location actually exists in the underlying storage could be useful
d

Dan Rammer (hamersaw)

02/17/2023, 8:03 PM
Yeah, that could be powerful. There have been a few discussions about cache expirations too - nothing being worked on yet but may be nice to get on the roadmap.
a

Ankit Goyal

02/17/2023, 8:05 PM
I was wondering if we should throw some kind of warning during compilation if cache directive has no impact on the task with no output
d

Dan Rammer (hamersaw)

02/17/2023, 8:10 PM
that sounds like a great idea to me! i think it would have to come from flytekit. cc @Yee @Eduardo Apolinario (eapolinario) thoughts on warning if a user defines a task cachable with no outputs (if there are no outputs propeller disables caching).
y

Yee

02/17/2023, 8:13 PM
yeah that’s something we can add for sure.
a

Ankit Goyal

02/17/2023, 8:13 PM
i can send a pr
y

Yee

02/17/2023, 8:13 PM
oh beautiful
thank you!
3 Views