I have a failed workflow, that I for some reason h...
# ask-the-community
l
I have a failed workflow, that I for some reason had an error that just said “unknown” I stopped the workflow so i could restart it and it now doesnt even have the relaunch recover options.
I have the ability to recover the sub-job which i have started but the main workflows UI is broken or i dont understand something.
k
Is it possible to inspect the console seems like a Ui bug
l
A lot of 501s and 404s
Does recovering the lower job stop me from being able to continuing the parent job?
I do notice the parent workflow duration is going up
though it simply says failed
oh actually it says failing
message has been deleted
k
Sorry not following- seems remote workflow failed?
l
Yes the workflow failed…. I want to recover….. this option is not available
I do notice that unlike all the other failed workflows this workflow has a status of failing
instead of failed
I think it has to do with the fact that i terminated teh job from the sub-job
How does one force a workflow to fail? In its current state our workflow is stuck with failing.
k
So it’s not a Ui bug
Why is it stuck
Is the crd still There?
If the crd was deleted then the only option is for I mark it deleted in the db
l
The crd is still there. Sorry im all over the place this has been a long set of nights. Yes not a UI bug seems to be stuck after i selected terminating.
the exact error from the workflow is
Copy code
failed at Node[relative-finder]. RemoteChildWorkflowExecutionFailed: launchplan [fl3afrmg1jj1to] aborted, caused by: launchplan execution aborted
Is there a way to make that job go to failed so i can recover?
What causes a failing status to appear
Is there anything i can do?
I have searched the flyte repo looking for failing as a status and I dont see it, outside of a boilerplate e2e test.
Is there nothing I can do here?
this is a fairly large problem for us.
We are currently in the failing state and we need to get to the failed state, something has our workflow stuck. What can I check, what can I do, what options do i have?
Copy code
{"json":{"exec_id":"akwqvqkqc87mndj8vwrv","ns":"e2e-workflows-development","routine":"worker-25"},"level":"warning","msg":"Workflow namespace[e2e-workflows-development]/name[akw │
│ qvqkqc87mndj8vwrv] has already been terminated.","ts":"2022-11-22T15:29:30Z"}
Wow the flyte team is so responsive even when i was being fairly annoying. Thank you for the effort everyone!
^ this is not sarcasm
k
Seems like a bug in the backend. Only remote workflow was “failing”? Does any other job fail?
l
Dan Hammer is currently taking a look at it.
k
Is possible abort the remote workflow on the UI?
Great,
l
No you cant
it seems to have just got a little out of sync
which has led it to staying in the failing state.
235 Views