Sören Brunk03/09/2022, 10:15 PM
The task pods start, go into running state. I can see a few lines of task logs sometimes but then it looks like the pod is just deleted long before it's finished. It terminates and is cleaned up. Log says:
Some node execution failed, auto-abort.
So i checked flytepropeller logs and there's one error log a few seconds before the pod is deleted. Not sure if it's related:
Stopping container m4dydd0r8y-n0-0
I'm a bit at a loss here how to debug this further. Any hints much appreciated. I can also provide the full propeller log if that could help tracing this down.
Failed to update workflow. Error [Operation cannot be fulfilled on <http://flyteworkflows.flyte.lyft.com|flyteworkflows.flyte.lyft.com> "m4dydd0r8y": the object has been modified; please apply your changes to the latest version and try again]
for k8s config? If not, can you set it so that the pods stick around even if the k8s node (machine) gets deleted or something, this will enable you to inspect the Pod even after it fails. The Failure to update Workflow issue tells me one of two things: 1. Maybe something/someone issued an out of band Delete operation on the CRD.. this can be that someone directly deleted the CRD or that someone issued an abort in the UI/Admin Api/flytectl… 2. Another propeller is running/competing in processing that CRD (unlikely unless there is an issue with the deployment)
Sören Brunk03/10/2022, 7:42 AM
crd instances in all namespaces right? Is there a way to restrict it to configured namespaces? If not, do you think it's feasible to add such a config option?
options in the propeller config which says "Namespaces to watch for this propeller". I suspect that could be what I'm looking for @Haytham Abuelfutuh https://github.com/flyteorg/flytepropeller/blob/c016dabbfef6037bead59590b42326dabe89f957/pkg/controller/config/config.go#L124
is restricted can only be a single namespace (or
), but not multiple namespaces. Is this assumption correct?