<@U06RKAMK0UC> How do you define a workflow where ...
# ask-ai
b
@ancient-wolf-19325 How do you define a workflow where two tasks need to run in parallel and the third task needs to run after the first two tasks complete. There is no data dependency between the tasks
j
It seems likely that you can achieve this in Flyte by defining a workflow where the first two tasks are executed in parallel, and the third task is executed after the completion of the first two tasks. You can use the
@workflow
decorator to define the workflow and the
@task
decorator for the tasks. The tasks can be executed in parallel by simply calling them without any dependencies, and the third task can be called after the first two tasks. Here's a basic example:
Copy code
from flytekit import task, workflow

@task
def task_a() -> None:
    print("task_a")

@task
def task_b() -> None:
    print("task_b")

@task
def task_c() -> None:
    print("task_c")

@workflow
def my_wf() -> None:
    a = task_a()
    b = task_b()
    task_c()
In this example,
task_a
and
task_b
will run in parallel, and
task_c
will run after both
task_a
and
task_b
have completed. Would you like more details on this setup? Sources: - Flyte Community Discussion 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
b
@ancient-wolf-19325 The solution does not work. A workflow like above results in three tasks running in parallel
j
It seems likely that to ensure
task_c
runs after
task_a
and
task_b
, you need to explicitly create dependencies between the tasks. You can achieve this by using the
create_node
function to establish the order of execution. Here's how you can modify the workflow:
Copy code
from flytekit import task, workflow
from flytekit.core.node_creation import create_node

@task
def task_a() -> None:
    print("task_a")

@task
def task_b() -> None:
    print("task_b")

@task
def task_c() -> None:
    print("task_c")

@workflow
def my_wf() -> None:
    node_a = create_node(task_a)
    node_b = create_node(task_b)
    node_c = create_node(task_c)

    # Ensure task_c runs after task_a and task_b
    node_a >> node_c
    node_b >> node_c
In this setup,
task_a
and
task_b
will run in parallel, and
task_c
will only run after both have completed. Would you like more details on this approach? Sources: - Flyte GitHub Issue 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
b
@ancient-wolf-19325 yes please, specifically the fact that tasks produce Promises, which are not accepted by create_node function
j
In Flyte, tasks within a workflow return
Promise
objects, which represent future values that are not yet computed. The
create_node
function is not designed to work directly with
Promise
objects. Instead, you should use the
Promise
objects to manage dependencies directly within the workflow function. To ensure that
task_c
runs after
task_a
and
task_b
, you can use the
Promise
objects returned by
task_a
and
task_b
to create a dependency for
task_c
. Here's how you can structure your workflow:
Copy code
from flytekit import task, workflow

@task
def task_a() -> None:
    print("task_a")

@task
def task_b() -> None:
    print("task_b")

@task
def task_c() -> None:
    print("task_c")

@workflow
def my_wf() -> None:
    a = task_a()
    b = task_b()
    # Use the Promises to ensure task_c runs after task_a and task_b
    a >> b >> task_c()
In this setup,
task_a
and
task_b
will run in parallel, and
task_c
will run after both have completed. The
>>
operator is used to establish the order of execution based on the
Promise
objects. Would you like more details on how
Promise
objects work in Flyte? Sources: - Flyte Documentation on Task Execution 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
a
@brief-family-9488 does the above work for you?
b
@average-finland-92144 The above works when you want task a and task b to run sequentially before running task c
however, it does not work if I want to run task a and task b in parallel and then run task c
a
What about using a MapTask? It's designed to perform parallel execution of a single task over multiple inputs. Not sure if that's your use case
b
hmm, that might work, I'll take a look
thanks for the tip
a
Sure, if there are more questions let us know 😉