Hey! We have this workflow that even though a task...
# ask-the-community
s
Hey! We have this workflow that even though a task failed, a subworkflow node looks like it is still running in flyteconsole 🧵 Anyone seen this before?
Main execution:
.
When I click into the
generate-hades-uri-V0
subworkflow nothing is running:
j
what is the failure policy of your workflow
s
is there an easy endpoint/command to tell? I couldn’t see anything on
Copy code
flytectl get workflow EpisodeJobWorkflow  -p scum -d production --version c5632bb2-bdfe-40dd-a3ae-d8b632bfb849 -o yaml
or
Copy code
<https://flyte.spotify.net/api/v1/workflows/scum/production/EpisodeJobWorkflow/c5632bb2-bdfe-40dd-a3ae-d8b632bfb849>
metadata is empty
Copy code
metadata:
          retries:
            retries: 5
          runtime:
            flavor: java
            type: FLYTE_SDK
            version: 0.0.1
        type: java-task
Copy code
"metadata": {},
j
hmm is it still stuck at running?
i think best place to find the issue is look in the
flytepropeller
log and
flyteadmin
log
s
yes 😕
Copy code
Failed to abort node [wpybz-n4]. Error: [SystemError] system error, caused by: rpc error: code = PermissionDenied desc = Cannot abort an already terminate workflow execution
d
@Sonja Ericsson do you have access to the FlyteWorkflow CR? So eventing to admin is best-effort in that if there is a failure the UI may not reflect the actual status of execution during aborts.
The above message suggests that the workflow has already been succesfully aborted and nothing should be currently running. Just that the UI was not updated to correctly reflect this.
s
yes I have access
Copy code
kubectl get fly  -n scum-production
doesn’t give me anything
ok, so its expected the UI may be incorrect?
d
i believe the CR name should be the execution ID of the workflow. propeller does clean these up after a configurable amount of time, and all of the Pods, etc that are started as part of the workflow evaluation are linked to the CR using k8s ownership. So if the CR has been cleaned up, then we can be sure that nothing related to that workflow is running.
do you know what version of flyteadmin / propeller you're running? i've tried to repro this a few different ways now, and i'm unable to find a bug in the UI marking the subworkflow correctly as "aborted".
s
makes sense it stopped showing those failed to abort message in the logs 10 hours ago
<http://github.com/flyteorg/flytepropeller|github.com/flyteorg/flytepropeller> v1.1.62
FROM <http://ghcr.io/flyteorg/flyteadmin:v1.1.70|ghcr.io/flyteorg/flyteadmin:v1.1.70>
<http://ghcr.io/flyteorg/flyteconsole:v1.4.1|ghcr.io/flyteorg/flyteconsole:v1.4.1>
153 Views