aloof-painting-18735
04/15/2024, 1:29 PMn0
├── n0-0-dn0
├── n0-0-dn1
├── n0-0-dn2
├── n0-0-dn3
├── n0-0-dn4
├── n0-0-dn5
├── n0-0-dn6
├── n0-0-dn7
├── n0-0-dn8
└── n0-0-dn9
n0-0-dn0 ... n0-0-dn9
nodes are running in parallel. If one of them (e.g. n0-0-dn9
) fails, all the other nodes (n0-0-dn0 ... n0-0-dn8
) will be aborted by Flyte.
Is this the intended behavior? Is this configurable? Re-running all the nodes due to a small intermittent issue in one of the nodes could generate extra computational cost.
@full-toddler-5766 @gentle-state-35322 @careful-holiday-56196thankful-minister-83577
thankful-minister-83577
thankful-minister-83577
thankful-minister-83577
FAIL_AFTER_EXECUTABLE_NODES_COMPLETE
aloof-painting-18735
04/17/2024, 12:01 PM