Hey, we use map task which runs in parallel 5 pods...
# ask-the-community
a
Hey, we use map task which runs in parallel 5 pods - one of them failed and it caused all other pod to be stuck on “running” phase although they finished their process (they were no k8s pods, but in the console UI we still saw them). when we tried to execute a workflow, they pods were stuck on “pending” status (since we configured 5 tasks in parallel maximum). It looks like a bug… Do you have any idea how we can solve this issue? cc: @Nizar Hattab
y
can you elaborate a bit please? do you mean when you subsequently tried to run a new workflow?
on the failed run, what’s the status of the workflow execution as a whole? (like the top bar)
as long as the pods are cleaned up and the top level workflow execution is in a terminal state, it should be fine.
if subsequent workflow runs aren’t running that’s a separate issue
n
The wf is marked as Failed . even tho it shouldn't be a failure, as we are passing the
min_success_ratio
param of 0.25. Those 4 tasks are still marked as running and blocking other tasks in other workflows from starting
a
The console shows that 4 tasks are still running… although they don’t exist in Kubernetes cluster as pods. Right now we have only 1 running pod in Kubernetes, but the workflow executions (for all workflows in this domain) continues to be stuck, since there are “4 running tasks” @Yee @Nizar Hattab
For example, the workflow is running but the task is queued
y
still a bit confused by what’s happening - how many total executions are happening here?
could you post full screenshots of each one? feel free to redact things.
a
ok, so from the beginning… we use map task - when on of these task is failed, the other running tasks are stuck on this status - which cause other tasks to be queued forever. we use
flyte-core
on gke k8s cluster, but don’t see these pods running/failed, so we can’t do anything in order to fix it…
it’s blocking our development process…