<@U06RKAMK0UC> when I run dynamic tasks and one of...
# ask-ai
s
@ancient-wolf-19325 when I run dynamic tasks and one of the tasks that get spawned fails, the other spawned tasks get aborted. Why?
j
When a task spawned by a dynamic workflow fails, Flyte's default failure policy is to abort all currently running nodes and mark the workflow as failed. This is because the default OnFailurePolicy is FAIL_IMMEDIATELY, which stops execution as soon as any node fails, aborting all other running nodes—even if they are independent. As a result, other spawned tasks are aborted when one fails. You can change this behavior by setting the failure policy to FAIL_AFTER_EXECUTABLE_NODES_COMPLETE, which allows all independent nodes to run to completion before marking the workflow as failed, but dependent nodes will still not run if their upstream node fails. This policy does not alter the dependency graph, so only nodes not depending on the failed node will continue to execute. See WorkflowMetadata.OnFailurePolicy for details. If you need more granular control (e.g., allowing a certain percentage of dynamic tasks to fail), this is not natively supported for sub-workflows or sub-dynamic-tasks—only for array jobs via min_successes. For dynamic tasks, the system will abort the dynamic job if the minimum number of successes cannot be reached, as described in the DynamicJobSpec. Would you like more detail or code examples? Sources: - WorkflowMetadata.OnFailurePolicy 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
s
@ancient-wolf-19325 what about in serveless union?
j
In Union.ai Serverless, the default failure policy for dynamic tasks is the same as Flyte OSS: if a spawned task fails, the default FAIL_IMMEDIATELY policy aborts all other running nodes and marks the workflow as failed. You can set the failure policy to FAIL_AFTER_EXECUTABLE_NODES_COMPLETE to allow independent nodes to finish before the workflow is marked as failed, but dependent nodes will still not run if their upstream node fails. This behavior is consistent across Flyte OSS and Union.ai Serverless, as described in the WorkflowMetadata.OnFailurePolicy documentation. There is no special override or different default for dynamic task failure handling in Union.ai Serverless compared to Flyte OSS. Would you like more detail or examples for configuring this in Union.ai Serverless? Sources: - WorkflowMetadata.OnFailurePolicy 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
s
@ancient-wolf-19325 and how can I do that on code?
j
You can set the failure policy in your workflow code by specifying the
failure_policy
parameter in the
@workflow
decorator. For example, use
WorkflowFailurePolicy.FAIL_AFTER_EXECUTABLE_NODES_COMPLETE
to allow all independent nodes to finish even if one fails:
Copy code
from flytekit import workflow, WorkflowFailurePolicy

@workflow(failure_policy=WorkflowFailurePolicy.FAIL_AFTER_EXECUTABLE_NODES_COMPLETE)
def my_workflow():
    # your dynamic tasks or other nodes here
    ...
This works the same way in both Flyte OSS and Union.ai Serverless. See the Flyte failure node documentation for more details. Would you like a full example with dynamic tasks? Sources: - Flyte failure node documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.