I'm trying to debug a task failure: it stopped aft...
# ask-the-community
f
I'm trying to debug a task failure: it stopped after 1h15min and I can't seem to find why. Is there a timeout set in flyte by default somewhere?
d
there are node execution deadlines but they default to something like 24h. so this would not be it. what kind of task is it?
f
A ContainerTask. In the flyte console it just tells me: Some node execution failed, auto-abort.
Ah, wait... does flyte abort all tasks in a workflow if another parallel one fails?
d
ok, do you know which node in the workflow failed? if one fails flyte aborts all running nodes in the workflow. there is a configuration option to allow the workflow to make as much progress as possible (ie. not abort currently running nodes)
^^^ exactly
the configuration is called failure policy and can be set on the workflow - i'll find the configuration quick
here are the available failure policies and it looks like they're set here with
failure_policy
f
ah, thanks @Dan Rammer (hamersaw) Can I change the default somewhere?
d
So right now I do not believe we expose a default configuration. It should be a relatively easy add if you're looking to make another contribution, otherwise it would be great to file an issue to get this on the roadmap.
f
hm... looks like flyteidl would need to be adapted for that...
or were you thinking a default in flytekit somehow?
I guess fail_immediately is a good default for most cases, just in my case this container task is not idempotent and I don't want it... IMHO there are bigger fishes to fry and changing flyteidl here is quite a bit more work than my case warrants... I'll just set in on the workflow decorator.
Thanks for the quick help though!
y
what are the other fish out of curiosity?
#3065?
f
That as well, but to me more importantly the possibility to add a pod spec to ContainerTasks or otherwise specify runetimeClass and affinity... That is a blocker right now
y
do pod templates not work for that?
f
For the runtimeClass, I use pod templates as a workaround. But affinity I need to set per task.
118 Views