Hi, some more questions about Flyte caching. Is t...
# ask-the-community
e
Hi, some more questions about Flyte caching. Is there a way to view the remote cache and answer questions like.. • See what tasks have cache entries and what keys/values are being stored? • See how much storage it’s taking up? • How often the cache is hit or missed when a task is called?
s
Hey @Dan Rammer (hamersaw)! Is this achievable?
e
A few more to add on.. • How long are cache entries stored? • Do we have any control over when cache entries are removed? ◦ eg. setting a TTL or manually deleting a cache_version
d
@Eric Song all great questions! I'll see what I can tackle here: • See what tasks have cache entries and what keys/values are being stored? -- In the UI there is a cache icon (indicating "hit" / "populated" / etc) on each individual cachable task execution. If you're looking for something like, say a separate page listing cached tasks and versions, this doesn't exist but could be retrieved from a DB query. • See how much storage it’s taking up? -- This is not currently displayed, but available through a DB query and subsequent blobstore information. • How often the cache is hit or missed when a task is called? -- This is something that could be answered by a DB query into the task execution metadata that FlyteAdmin stores. There is a table that stores each individual task execution has a field for the cache status. As far as integrating this into the UI, there's no reason that it couldn't be done - just gauging the priority pending community interest. • How long are cache entries stored? -- Indefinitely unless otherwise specified. There is a flag for
maxCacheAge
which is set on FlytePropeller and indicates that any cached data older than X should not be used. Unfortunately this is not currently configurable on a per-task basis. This is something we have discussed adding many times - most recently in this issue. • Do we have any control over when cache entries are removed? -- There has been great community involvement here, as this is an issue many people face. The current situation is that Flyte has an
overwriteCache
flag that can be set when launching an execution (either from the UI or cli) that indicates that all tasks that are cached should be recomputed and the new results used. Additionally, this issue (and many associated PRs) contains implementation for cache delete over specific executions. So a user (again from the UI or cli) can manually delete all cached data that resulted from a specific execution. We have not merged the cache delete functionality yet, but it is on the roadmap at high priority.
e
Thanks for the thorough answers @Dan Rammer (hamersaw)! It looks like I can get a lot of the information I need by querying the DB. Do you know if there is any documentation (in code / not web is fine) on the db schemas used for Flyte Caching ? If not, just the table names should be fine for me to get started.
154 Views