Hey, a question about the recover function in Flyte:
I think at this given time it is not possible to recover a failed workflow with different task versions right?
Looking at use cases where a workflow fails, the user registers a new version for the failed task and recovers the failed workflow with the latest tasks version. I think this can also be done via caching but is this maybe on some agenda?
11/17/2023, 11:35 AM
given the static nature of how workflows are defined, i think this is a ways off, kinda goes against the reproducibility of flyte i feel.
but maybe possible one day? more like via redefining a new workflow with the new version i feel.
11/17/2023, 12:00 PM
Yeah makes complete sense to me - thank you!
11/17/2023, 8:50 PM
@Yee - Jan brings up an interesting point in this. We've hit this too and I think this scenario is pretty common.
Imagine we run our N step workflow. At step, say 5, it fails because the task has a problem. We need to fix our code so the task can complete but steps 1-4 are very long-running tasks and the outputs they created are large.
What is the "most right" way to re-use those nicely preserved/immutable outputs from steps 1-4? So far, we've just re-ran the entire workflow or come up with not-so-great ways to re-use the outputs.
11/21/2023, 1:58 AM
Could you two work on a ticket for this? Whatever form this takes, this will be a pretty core change to flyte. I can see some ways in which this can be accomplished, but they probably more fall in line with what terence was thinking… re-using prior outputs in a new workflow definition, rather than somehow augmenting the existing workflow or workflow execution. to do what you said today @Terence Kent you would’ve needed to have the foresight to make the outputs cached.
i feel like the resolution here will ultimately take the form of something akin to enabling and filling in cache from a prior execution. i’m sure others will have different ideas too