Varun Kulkarni08/21/2023, 3:11 PM
Because each processing task runs independently, it's theoretically possible for parts of the
# This workflow does some work to fetch three pieces of data: A, B, C @workflow def fetch_data() -> Tuple(A, B, C): ... # This workflow processes each fetched dataset @workflow def process_data(a: A, b: B, c: C): first_processing_task(a) second_processing_task(b) third_processing_task(c) ... # This workflow puts it all together @workflow def main(): a, b, c = fetch_data() return process_data(a=a, b=b, c=c)
workflow to run even if other parts cannot. For example, let's say that fetching datasets A and B is very quick (say, 5 sec) but C takes a very long time (say, 10min). In this case, it would be ideal for any downstream work relying exclusively on A or B to be launched even while C is still being fetched. Is there any way to configure Flyte to greedily kick off the tasks for processing A and B in the
workflow while we're waiting for output C from the
workflow? Or is it an ironclad rule within Flyte that the tasks of a downstream workflow can only be launched once the upstream workflow(s) have all fully succeeded?
Dan Rammer (hamersaw)08/21/2023, 3:23 PM
If there are no dependencies between fetching / processing A,B, and C they should probably be separate tasks?
@workflow def fetch_data_a() -> A: ... @workflow def fetch_data_b() -> B: ... @workflow def fetch_data_c() -> C: ... @workflow def process_data_a(a: A): first_processing_task(a) ... @workflow def main(): a = fetch_data_a() a2 = process_data_a(a=a) b = fetch_data_b() b2 = process_data_b(b=b) c = fetch_data_c() c2 = process_data_c(c=c) return a2, b2, c2
Varun Kulkarni08/21/2023, 3:32 PM
seems like overkill.