Hi there! I'm just discovering flyte and have question regarding pipelines, which I didn't find the answer to yet:
Is it possible to achieve this:
• Task A
◦ checks external source periodically for updates
◦ if external source changes, provide different outputs
◦ invalidate all produced artifacts (e.g. checkpoints, metrics, ...) of the whole pipeline
• Task B (depends on Task A)
◦ restart when Task A changes its outputs
The use-case would be that a user changes the training-configuration and thus the training has to be restarted. Also, all produced checkpoints should be invalidated.
t
tall-lock-23197
02/06/2023, 4:34 AM
Hi @flaky-car-61813!
• You can check external source for updates within a Flyte task. A Flyte task essentially is a Python function — so you can do anything.
• You can provide different outputs but the task type you define has to be the same as that of the output's type.
• By invalidation, do you mean delete? You can version your data, models, metrics and so on with Flyte. You can pick the latest and that should solve the invalidation case, right?
• You can restart Task B when Task A changes. You'll need to handle this by yourself in the code or you can automate this using systems. I highly recommend you to watch this recording
https://www.youtube.com/watch?v=njNKBke5sQ0▾
that elaborates on the signaling feature of Flyte. I'm guessing this is what you're looking for.