(Caching) Hey all, I rewrote one of my company’s ...
# ask-the-community
v
(Caching) Hey all, I rewrote one of my company’s critical workflows in Flyte and it works well. However it frequently runs with multiple repetitive sets of inputs. For example, it may run 50 times a week with (mostly the) same 2-3 sets of inputs. From my understanding, cache_version only supports one version, so both can’t be cached at the same time. If I were to run a workflow with the first set of inputs, then the second, then the third, and repeated it 10 times, caching would effectively be disabled, because the cache keys keep changing, right? Is there any workaround or other solution to this problem? I’d also like to configure cache staleness so cache can expire after a while, I tried to achieve this using lifecycle rules on my cloud storage but in case the cloud deletes the cached outputs, Flyte still tries to fetch them and errors on it. Let me know how you dealt with this issue in your cases, thanks
b
Hi Victor! As to cache hits: I am not 100% sure whether I understand things correctly, but I think this should work as you’d expect it. So say you have one task:
Copy code
@task(cache=True, cache_version="1")
def foo(a: int) -> int:
   ...
And you first run it with
a=1
, then with
a=2
and then again with
a=1
, the third iteration should get a cache hit. Is this what you’ve been thinking? As for cache invalidation, in addition to the bucket lifecycle rules you have to set the
max-cache-age
flytepropeller setting
v
Thanks a lot for the quick response and the cache invalidation tip! Now I can just set the max cache age slightly lower than the lifecycle rule on the bucket, and it will work well This is the part that worried me in the docs:
Bumping the
cache_version
is akin to invalidating the cache.
But now I see that I should in theory be able to keep the cache_version permanently at the same value and it will cache multiple sets of inputs. I am testing this now, will update here when I confirm it works. Thanks a lot
b
Fixing the cache version should do the trick - let us know in case it doesn’t work! In case you’re using
map_task
, there’s currently a known issue with regards to caching which should be fixed within the week. Also, be careful with setting the cache version to a fixed value - as we all know together with naming, caching is one of the two hard things in computer science
k
Cache version is a user controlled parameter to allow you to tell the system the behavior of the code has changed
v
Thanks for the clarifications, I just tested running everything with the same cache versions and it works well
159 Views