Hi guys! Is there a way to invalidate the cache fo...
# announcements
m
Hi guys! Is there a way to invalidate the cache for a certain input for a task within a workflow, without increasing the cache version/invalidating the cache of other inputs?
We have a heavy-lift task that managed to generate a corrupt result (something caused by our code) (6/~800 are corrupt) so we'd like to just restart the 6 without touching the cache of the other ones.
n
hi @Maarten de Jong unfortunately afaik there’s no way to do this for particular sets of inputs. What’s the cost of invalidating the cache for all inputs (i.e. recomputing everything)?
m
hey, in this particular case it's quite expensive (~15TB of data to redownload and reprocess) but we have a workaround for now, just wanted to make sure we're not missing anything. I could drop a feature request ticket on the github if you think it is technically feasible and relevant
n
^^ @Eduardo Apolinario (eapolinario) @Ketan (kumare3)?
did the error/corruption occur in the download or process step?
m
in the download step, so subsequent processing steps used the corrupt download
just to highlight that the download is from an internal service and corruption occurred due to stuff on our side, not because of a Flyte download or similar
k
@Maarten de Jong how about you run the 6 steps manually
Just like interruptible override we can have cache override
So tbh- if something was corrupt then something has to change. Maybe the red to the dataset has to change else the invariant of when to cache is broken
@Maarten de Jong one way would be to delete the entry from data catalog
It's a db entry just delete it. You can find the dataset if from Flyte console/ FlyteAdmin rest api
n
that's what I've been looking into briefly as another workaround for this one instance. as it might become more relevant in future, larger runs, something that doesn't require digging around in a DB manually would be preferred
definitely something I can look into as well 🙂
k
For now you can delete the entry
This is surgery
@Maarten de Jong does that help
m
yeah definitely for now, thank you
👍 1
n
@Ketan (kumare3) I believe I managed to find them that way (looking into the issue with @Maarten de Jong). for the future, having some option/extra checkbox to run without cache via the flyte console would be great, I'll put it onto our internal todo list
can also create a github issue for tracking if you'd like
k
Yes please do
👍 1
But let's write an rfc for this, as this might have subtleties
n
sure thing
would you like me to write that up as well (not sure I can finish a full RFC doc before my holiday starting tomorrow) or contribute to something you're writing up?
k
No I think you should write it
👍 1
It's ok take your time, we will not get a chance to work on this
n
I'll start a draft tomorrow and will probably finish it when I'm back in two weeks then. We'll also have a bit more of a discussion internally what'd be helpful to us
👍 1
Took longer than expected to actually finish, but the RFC's just been published: https://github.com/flyteorg/flyte/pull/2633
🙏 1
168 Views