lively-sundown-82704
08/21/2023, 3:11 PM# This workflow does some work to fetch three pieces of data: A, B, C
@workflow
def fetch_data() -> Tuple(A, B, C):
...
# This workflow processes each fetched dataset
@workflow
def process_data(a: A, b: B, c: C):
first_processing_task(a)
second_processing_task(b)
third_processing_task(c)
...
# This workflow puts it all together
@workflow
def main():
a, b, c = fetch_data()
return process_data(a=a, b=b, c=c)
Because each processing task runs independently, it's theoretically possible for parts of the process_data
workflow to run even if other parts cannot. For example, let's say that fetching datasets A and B is very quick (say, 5 sec) but C takes a very long time (say, 10min). In this case, it would be ideal for any downstream work relying exclusively on A or B to be launched even while C is still being fetched.
Is there any way to configure Flyte to greedily kick off the tasks for processing A and B in the process_data
workflow while we're waiting for output C from the fetch_data
workflow? Or is it an ironclad rule within Flyte that the tasks of a downstream workflow can only be launched once the upstream workflow(s) have all fully succeeded?hallowed-mouse-14616
08/21/2023, 3:23 PM@workflow
def fetch_data_a() -> A:
...
@workflow
def fetch_data_b() -> B:
...
@workflow
def fetch_data_c() -> C:
...
@workflow
def process_data_a(a: A):
first_processing_task(a)
...
@workflow
def main():
a = fetch_data_a()
a2 = process_data_a(a=a)
b = fetch_data_b()
b2 = process_data_b(b=b)
c = fetch_data_c()
c2 = process_data_c(c=c)
return a2, b2, c2
If there are no dependencies between fetching / processing A,B, and C they should probably be separate tasks?lively-sundown-82704
08/21/2023, 3:32 PMfreezing-boots-56761
@task
? @workflow
seems like overkill.