https://flyte.org logo
#ask-the-community
Title
# ask-the-community
s

Sonja Ericsson

03/14/2023, 5:26 PM
Hey! We have this workflow that even though a task failed, a subworkflow node looks like it is still running in flyteconsole 🧵 Anyone seen this before?
Main execution:
.
When I click into the
generate-hades-uri-V0
subworkflow nothing is running:
j

Jay Ganbat

03/14/2023, 7:00 PM
what is the failure policy of your workflow
s

Sonja Ericsson

03/14/2023, 8:23 PM
is there an easy endpoint/command to tell? I couldn’t see anything on
Copy code
flytectl get workflow EpisodeJobWorkflow  -p scum -d production --version c5632bb2-bdfe-40dd-a3ae-d8b632bfb849 -o yaml
or
Copy code
<https://flyte.spotify.net/api/v1/workflows/scum/production/EpisodeJobWorkflow/c5632bb2-bdfe-40dd-a3ae-d8b632bfb849>
metadata is empty
Copy code
metadata:
          retries:
            retries: 5
          runtime:
            flavor: java
            type: FLYTE_SDK
            version: 0.0.1
        type: java-task
Copy code
"metadata": {},
j

Jay Ganbat

03/14/2023, 8:39 PM
hmm is it still stuck at running?
i think best place to find the issue is look in the
flytepropeller
log and
flyteadmin
log
s

Sonja Ericsson

03/14/2023, 8:43 PM
yes 😕
Copy code
Failed to abort node [wpybz-n4]. Error: [SystemError] system error, caused by: rpc error: code = PermissionDenied desc = Cannot abort an already terminate workflow execution
d

Dan Rammer (hamersaw)

03/14/2023, 8:48 PM
@Sonja Ericsson do you have access to the FlyteWorkflow CR? So eventing to admin is best-effort in that if there is a failure the UI may not reflect the actual status of execution during aborts.
The above message suggests that the workflow has already been succesfully aborted and nothing should be currently running. Just that the UI was not updated to correctly reflect this.
s

Sonja Ericsson

03/14/2023, 9:01 PM
yes I have access
Copy code
kubectl get fly  -n scum-production
doesn’t give me anything
ok, so its expected the UI may be incorrect?
d

Dan Rammer (hamersaw)

03/14/2023, 9:03 PM
i believe the CR name should be the execution ID of the workflow. propeller does clean these up after a configurable amount of time, and all of the Pods, etc that are started as part of the workflow evaluation are linked to the CR using k8s ownership. So if the CR has been cleaned up, then we can be sure that nothing related to that workflow is running.
do you know what version of flyteadmin / propeller you're running? i've tried to repro this a few different ways now, and i'm unable to find a bug in the UI marking the subworkflow correctly as "aborted".
s

Sonja Ericsson

03/14/2023, 9:08 PM
makes sense it stopped showing those failed to abort message in the logs 10 hours ago
<http://github.com/flyteorg/flytepropeller|github.com/flyteorg/flytepropeller> v1.1.62
FROM <http://ghcr.io/flyteorg/flyteadmin:v1.1.70|ghcr.io/flyteorg/flyteadmin:v1.1.70>
<http://ghcr.io/flyteorg/flyteconsole:v1.4.1|ghcr.io/flyteorg/flyteconsole:v1.4.1>
3 Views