Hi Community We are testing dynamic workflows and found an i Flyte #flyte-support

Hi Community! We are testing dynamic workflows an...

aloof-painting-18735

04/15/2024, 1:29 PM

Hi Community! We are testing dynamic workflows and found an interesting system behaviour that I would like to discuss. We have a dynamic workflow that runs dynamic tasks, e.g.

Copy code

n0
├── n0-0-dn0
├── n0-0-dn1
├── n0-0-dn2
├── n0-0-dn3
├── n0-0-dn4
├── n0-0-dn5
├── n0-0-dn6
├── n0-0-dn7
├── n0-0-dn8
└── n0-0-dn9

n0-0-dn0 ... n0-0-dn9

nodes are running in parallel. If one of them (e.g.

n0-0-dn9

) fails, all the other nodes (

n0-0-dn0 ... n0-0-dn8

) will be aborted by Flyte. Is this the intended behavior? Is this configurable? Re-running all the nodes due to a small intermittent issue in one of the nodes could generate extra computational cost. @full-toddler-5766 @gentle-state-35322 @careful-holiday-56196

thankful-minister-83577

04/15/2024, 10:36 PM

this is the intended behavior but it’s also customizable.

thankful-minister-83577

04/15/2024, 10:36 PM

are you setting the failure mode?

thankful-minister-83577

04/15/2024, 10:36 PM

https://github.com/flyteorg/flytekit/blob/e1e21da4ce75e08308e4f3f212f7efbe546ff8cb/flytekit/core/workflow.py#L854

thankful-minister-83577

04/15/2024, 10:37 PM

there’s one called

FAIL_AFTER_EXECUTABLE_NODES_COMPLETE

aloof-painting-18735

04/17/2024, 12:01 PM

@thankful-minister-83577 thanks for the hint, pretty useful! no, we are not setting the failure mode, just go with the default settings let's give it a try! thanks again!

2 Views

Open in Slack

Previous Next