Hi All
We are looking into a use case where a data importer workflow that produces N datasets (N > 1000), where each dataset updates sporadically. We would like to construct a workflow for training a model, that only depends on a subset (< 5) of the N datasets.
Idealy:
1. the model workflow should only trigger when the dependant subset of datasets is updated.
2. the model workflow is independent from the importer workflow
How would you recommend solving this in Flyte?
t
tall-lock-23197
05/09/2023, 4:47 AM
How about you encapsulate the model update logic in a task and check for the same in a
conditional
? You can return a boolean when the datasets are updated. If the boolean is set to true, trigger the model workflow. You can also run this on a schedule to trigger the model workflow on a regular cadence.