Thread
#announcements
    Maarten de Jong

    Maarten de Jong

    3 months ago
    Hi guys! Is there a way to invalidate the cache for a certain input for a task within a workflow, without increasing the cache version/invalidating the cache of other inputs?
    We have a heavy-lift task that managed to generate a corrupt result (something caused by our code) (6/~800 are corrupt) so we'd like to just restart the 6 without touching the cache of the other ones.
    Niels Bantilan

    Niels Bantilan

    3 months ago
    hi @Maarten de Jong unfortunately afaik there’s no way to do this for particular sets of inputs. What’s the cost of invalidating the cache for all inputs (i.e. recomputing everything)?
    Maarten de Jong

    Maarten de Jong

    3 months ago
    hey, in this particular case it's quite expensive (~15TB of data to redownload and reprocess) but we have a workaround for now, just wanted to make sure we're not missing anything. I could drop a feature request ticket on the github if you think it is technically feasible and relevant
    Niels Bantilan

    Niels Bantilan

    3 months ago
    ^^ @Eduardo Apolinario (eapolinario) @Ketan (kumare3)?
    did the error/corruption occur in the download or process step?
    Maarten de Jong

    Maarten de Jong

    3 months ago
    in the download step, so subsequent processing steps used the corrupt download
    just to highlight that the download is from an internal service and corruption occurred due to stuff on our side, not because of a Flyte download or similar
    Ketan (kumare3)

    Ketan (kumare3)

    3 months ago
    @Maarten de Jong how about you run the 6 steps manually
    Just like interruptible override we can have cache override
    So tbh- if something was corrupt then something has to change. Maybe the red to the dataset has to change else the invariant of when to cache is broken
    @Maarten de Jong one way would be to delete the entry from data catalog
    It's a db entry just delete it. You can find the dataset if from Flyte console/ FlyteAdmin rest api
    Nick Müller (MorpheusXAUT)

    Nick Müller (MorpheusXAUT)

    3 months ago
    that's what I've been looking into briefly as another workaround for this one instance. as it might become more relevant in future, larger runs, something that doesn't require digging around in a DB manually would be preferred
    definitely something I can look into as well 🙂
    Ketan (kumare3)

    Ketan (kumare3)

    3 months ago
    For now you can delete the entry
    This is surgery
    @Maarten de Jong does that help
    Maarten de Jong

    Maarten de Jong

    3 months ago
    yeah definitely for now, thank you
    Nick Müller (MorpheusXAUT)

    Nick Müller (MorpheusXAUT)

    3 months ago
    @Ketan (kumare3) I believe I managed to find them that way (looking into the issue with @Maarten de Jong). for the future, having some option/extra checkbox to run without cache via the flyte console would be great, I'll put it onto our internal todo list
    can also create a github issue for tracking if you'd like
    Ketan (kumare3)

    Ketan (kumare3)

    3 months ago
    Yes please do
    But let's write an rfc for this, as this might have subtleties
    Nick Müller (MorpheusXAUT)

    Nick Müller (MorpheusXAUT)

    3 months ago
    sure thing
    would you like me to write that up as well (not sure I can finish a full RFC doc before my holiday starting tomorrow) or contribute to something you're writing up?
    Ketan (kumare3)

    Ketan (kumare3)

    3 months ago
    No I think you should write it
    It's ok take your time, we will not get a chance to work on this
    Nick Müller (MorpheusXAUT)

    Nick Müller (MorpheusXAUT)

    3 months ago
    I'll start a draft tomorrow and will probably finish it when I'm back in two weeks then. We'll also have a bit more of a discussion internally what'd be helpful to us
    Took longer than expected to actually finish, but the RFC's just been published: https://github.com/flyteorg/flyte/pull/2633