(Caching) Hey all, I rewrote one of my company’s ...
# ask-the-community
(Caching) Hey all, I rewrote one of my company’s critical workflows in Flyte and it works well. However it frequently runs with multiple repetitive sets of inputs. For example, it may run 50 times a week with (mostly the) same 2-3 sets of inputs. From my understanding, cache_version only supports one version, so both can’t be cached at the same time. If I were to run a workflow with the first set of inputs, then the second, then the third, and repeated it 10 times, caching would effectively be disabled, because the cache keys keep changing, right? Is there any workaround or other solution to this problem? I’d also like to configure cache staleness so cache can expire after a while, I tried to achieve this using lifecycle rules on my cloud storage but in case the cloud deletes the cached outputs, Flyte still tries to fetch them and errors on it. Let me know how you dealt with this issue in your cases, thanks
Hi Victor! As to cache hits: I am not 100% sure whether I understand things correctly, but I think this should work as you’d expect it. So say you have one task:
Copy code
@task(cache=True, cache_version="1")
def foo(a: int) -> int:
And you first run it with
, then with
and then again with
, the third iteration should get a cache hit. Is this what you’ve been thinking? As for cache invalidation, in addition to the bucket lifecycle rules you have to set the
flytepropeller setting
Thanks a lot for the quick response and the cache invalidation tip! Now I can just set the max cache age slightly lower than the lifecycle rule on the bucket, and it will work well This is the part that worried me in the docs:
Bumping the
is akin to invalidating the cache.
But now I see that I should in theory be able to keep the cache_version permanently at the same value and it will cache multiple sets of inputs. I am testing this now, will update here when I confirm it works. Thanks a lot
Fixing the cache version should do the trick - let us know in case it doesn’t work! In case you’re using
, there’s currently a known issue with regards to caching which should be fixed within the week. Also, be careful with setting the cache version to a fixed value - as we all know together with naming, caching is one of the two hard things in computer science
Cache version is a user controlled parameter to allow you to tell the system the behavior of the code has changed
Thanks for the clarifications, I just tested running everything with the same cache versions and it works well