Flyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.

Flyte

<https://github.com/flyteorg/flytepropeller/pull/598|#598 correct propagation of launchplan start error>
Pull request opened by <https://github.com/hamersaw|hamersaw>
*TL;DR*

Correctly fails a workflow node where the launchplan fails to start on admin.

*Type*

☑︎ Bug Fix
☐ Feature
☐ Plugin

*Are all requirements met?*

☑︎ Code completed
☑︎ Smoke tested
☐ Unit tests added
☑︎ Code documentation added
☐ Any pending items have an associated Issue

*Complete description*

Launchplans are executing in FlytePropeller as `WorkflowNodes`. Basically, a launchplan is executed by FlytePropeller sending an execution request admin, which then starts the launchplan, and FlytePropeller stores the execution ID in the `WorkflowNode` state. At each iteration FlytePropeller checks the status of the FlyteWorkflow CR represented by the execution ID and updates the `WorkflowNode` state accordingly.

What is happening in the issue linked below is FlyteAdmin is failing to start the launchplan. FlytePropeller detects this failure and in doing so maintains the proposed execution ID in the `WorkflowNode` state (<https://github.com/flyteorg/flytepropeller/blob/8446baf92748f6079f43aaa94d1b0b97df233a9e/pkg/controller/nodes/subworkflow/launchplan.go#L112-L114|here>) and transitions the node to a failed state. When FlytePropeller attempts to event this state to FlyteAdmin, it checks whether the execution ID exists(<https://github.com/flyteorg/flyteadmin/blob/4713861821d9e4195b65b1d20fb56c5974354ef4/pkg/manager/impl/node_execution_manager.go#L151-L164|here>). Of course since FlyteAdmin failed to start the launchplan the execution ID does not exist. This failure results in the `Workflow does not exist` error that we see. And ultimately, FlytePropeller proceeds with aborting the `WorkflowNode`, which is entirely unnecessary.

To fix this, there are two possible solutions:  
(1) If a launchplan fails to start by a user error (ex.invalid type interface), we do not set the execution ID on the `WorkflowNode` state because the execution ID was never started. Of course, this means that we trust FlyteAdmin to report user errors only when the launchplan was not able to execute -- I think this is reasonable. *This is implemented in this PR.*  
(2) Allow FlyteAdmin to fail checking the existence of an execution ID for events that report a failed state.

*Tracking Issue*

<https://github.com/unionai/cloud/issues/4172|unionai/cloud#4172>

*Follow-up issue*

_NA_
<https://github.com/flyteorg/flytepropeller|flyteorg/flytepropeller>
GitHub Actions: Build &amp; Push Flytepropeller Image
GitHub Actions: Goreleaser
GitHub Actions: Bump Version
:white_check_mark: 11 other checks have passed
11/14 successful checks

<https://github.com/flyteorg/flytepropeller/pull/598|#598 Fixed correct propagation of launchplan start error>
Pull request ready for review by <https://github.com/hamersaw|hamersaw>
<https://github.com/flyteorg/flytepropeller|flyteorg/flytepropeller>

<https://github.com/flyteorg/flytepropeller/pull/598|#598 correct propagation of launchplan start error>
Pull request merged by <https://github.com/hamersaw|hamersaw>
<https://github.com/flyteorg/flytepropeller|flyteorg/flytepropeller>