Got an interesting panic in flyte propeller / k8s-...
# ask-the-community
c
Got an interesting panic in flyte propeller / k8s-array plugin on the first execution (i.e., uncached) of a map task. Around the same time seeing lots of new
Dataset does not exist key
warnings logs in datadatalog. We have other map tasks that also ran uncached a little earlier before this one, and they didn't encounter this error (i.e., we're pretty sure from that fact + the code that this isn't the normal "cache missed" message) The map task in question takes a FlyteFile (i.e., a
List[FlyteFile]
is passed to
map_task()
) and it returns an int. Maybe this has something to do with flyte files and the data catalog? Will put full panic trace in thread
Copy code
Workflow[relative-finder:development:relative_finder.workflows.relative_finder.relative_finder_wf] failed. RuntimeExecutionError: max number of system retry attempts [51/50] exhausted. Last known status message: failed at Node[n1]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [k8s-array]: panic when executing a plugin [k8s-array]. Stack: [goroutine 760 [running]:
runtime/debug.Stack()
	/usr/local/go/src/runtime/debug/stack.go:24 +0x65
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes/task.Handler.invokePlugin.func1.1()|github.com/flyteorg/flytepropeller/pkg/controller/nodes/task.Handler.invokePlugin.func1.1()>
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/handler.go:375 +0xfe
panic({0x1f3c580, 0x3952540})
	/usr/local/go/src/runtime/panic.go:838 +0x207
<http://github.com/flyteorg/flytestdlib/bitarray.(*BitSet).IsSet(...)|github.com/flyteorg/flytestdlib/bitarray.(*BitSet).IsSet(...)>
	/go/pkg/mod/github.com/flyteorg/flytestdlib@v1.0.4/bitarray/bitset.go:33
<http://github.com/flyteorg/flyteplugins/go/tasks/plugins/array/core.InitializeExternalResources({0x2796db0|github.com/flyteorg/flyteplugins/go/tasks/plugins/array/core.InitializeExternalResources({0x2796db0>, 0xc01870b290}, {0x27a2700?, 0xc00116c4d0?}, 0xc002130240, 0x23cf0a8)
	/go/pkg/mod/github.com/flyteorg/flyteplugins@v1.0.8/go/tasks/plugins/array/core/metadata.go:33 +0x1e1
<http://github.com/flyteorg/flyteplugins/go/tasks/plugins/array/k8s.Executor.Handle({{0x7fe260ad1090|github.com/flyteorg/flyteplugins/go/tasks/plugins/array/k8s.Executor.Handle({{0x7fe260ad1090>, 0xc00098afc0}, {{0x2789d50, 0xc0018ca0b0}}, {{0x2789d50, 0xc0018ca160}}}, {0x2796db0, 0xc01870b290}, {0x27a2700, 0xc00116c4d0})
	/go/pkg/mod/github.com/flyteorg/flyteplugins@v1.0.8/go/tasks/plugins/array/k8s/executor.go:94 +0x225
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes/task.Handler.invokePlugin.func1(0x0|github.com/flyteorg/flytepropeller/pkg/controller/nodes/task.Handler.invokePlugin.func1(0x0>?, {0x2796db0, 0xc01870b050}, {0x2799298?, 0xc0007daf00?}, 0x0?)
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/handler.go:382 +0x178
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes/task.Handler.invokePlugin({{0x27970f8|github.com/flyteorg/flytepropeller/pkg/controller/nodes/task.Handler.invokePlugin({{0x27970f8>, 0xc0011919b0}, {0x27848f0, 0xc000d37da0}, 0xc00143ccc0, 0xc00143ccf0, 0xc00143cd20, {0x2798818, 0xc001710000}, 0xc0018522c0, ...}, ...)
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/handler.go:384 +0x9a
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes/task.Handler.Handle({{0x27970f8|github.com/flyteorg/flytepropeller/pkg/controller/nodes/task.Handler.Handle({{0x27970f8>, 0xc0011919b0}, {0x27848f0, 0xc000d37da0}, 0xc00143ccc0, 0xc00143ccf0, 0xc00143cd20, {0x2798818, 0xc001710000}, 0xc0018522c0, ...}, ...)
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/handler.go:617 +0x182b
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes/dynamic.dynamicNodeTaskNodeHandler.handleParentNode({{0x279a148|github.com/flyteorg/flytepropeller/pkg/controller/nodes/dynamic.dynamicNodeTaskNodeHandler.handleParentNode({{0x279a148>, 0xc000a5cdd0}, {{0xc000b258c0, {{...}, 0x0}, {0xc0009e8440, 0x4, 0x4}}, {0xc000b258e0, {{...}, ...}, ...}, ...}, ...}, ...)
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/dynamic/handler.go:70 +0xd8
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes/dynamic.dynamicNodeTaskNodeHandler.Handle({{0x279a148|github.com/flyteorg/flytepropeller/pkg/controller/nodes/dynamic.dynamicNodeTaskNodeHandler.Handle({{0x279a148>, 0xc000a5cdd0}, {{0xc000b258c0, {{...}, 0x0}, {0xc0009e8440, 0x4, 0x4}}, {0xc000b258e0, {{...}, ...}, ...}, ...}, ...}, ...)
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/dynamic/handler.go:220 +0x9d0
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).execute(0xc00131e240|github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).execute(0xc00131e240>, {0x2796db0, 0xc01870aba0}, {0x2798698, 0xc00143e000}, 0xc001082600, {0x27ab1b8?, 0xc001e981a0?})
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/executor.go:382 +0x157
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).handleQueuedOrRunningNode(0xc00131e240|github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).handleQueuedOrRunningNode(0xc00131e240>, {0x2796db0, 0xc01870aba0}, 0xc001082600, {0x2798698?, 0xc00143e000?})
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/executor.go:512 +0x227
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).handleNode(0xc00131e240|github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).handleNode(0xc00131e240>, {0x2796db0, 0xc01870aba0}, {0x7fe2606e34d0, 0xc0050f5490}, 0xc001082600, {0x2798698?, 0xc00143e000})
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/executor.go:736 +0x3c5
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).RecursiveNodeHandler(0xc00131e240|github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).RecursiveNodeHandler(0xc00131e240>, {0x2796db0, 0xc01870a420}, {0x27a63e8, 0xc01ac29400}, {0x7fe2606e34d0, 0xc0050f5490}, {0x2784a30?, 0xc018156b60?}, {0x27a3810, ...})
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/executor.go:934 +0x705
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).handleDownstream(0x22f4f2d|github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).handleDownstream(0x22f4f2d>?, {0x2796db0, 0xc01870a420}, {0x27a63e8, 0xc01ac29400}, {0x7fe2606e34d0, 0xc0050f5490?}, {0x2784a30?, 0xc018156b60}, {0x27a3810, ...})
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/executor.go:774 +0x3c5
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).RecursiveNodeHandler(0xc00131e240|github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).RecursiveNodeHandler(0xc00131e240>, {0x2796db0, 0xc01870a420}, {0x27a63e8, 0xc01ac29400}, {0x7fe2606e34d0, 0xc0050f5490}, {0x2784a30?, 0xc018156b60?}, {0x27a3810, ...})
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/executor.go:941 +0x935
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).handleDownstream(0x22f4f2d|github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).handleDownstream(0x22f4f2d>?, {0x2796db0, 0xc01870a420}, {0x27a63e8, 0xc01ac29400}, {0x7fe2606e34d0, 0xc0050f5490?}, {0x2784a30?, 0xc018156b60}, {0x27a3810, ...})
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/executor.go:774 +0x3c5
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).RecursiveNodeHandler(0xc00131e240|github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).RecursiveNodeHandler(0xc00131e240>, {0x2796db0, 0xc01870a420}, {0x27a63e8, 0xc01ac29400}, {0x7fe2606e34d0, 0xc0050f5490}, {0x2784a30?, 0xc018156b60?}, {0x27a3810, ...})
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/executor.go:941 +0x935
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes/subworkflow.(*subworkflowHandler).handleSubWorkflow(0xc000490c68|github.com/flyteorg/flytepropeller/pkg/controller/nodes/subworkflow.(*subworkflowHandler).handleSubWorkflow(0xc000490c68>, {0x2796db0, 0xc01870a420}, {0x27a4130, 0xc001082540}, {0x27a1b40, 0xc0050f5490}, {0x2784a30, 0xc018156b60})
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/subworkflow/subworkflow.go:74 +0x334
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes/subworkflow.(*subworkflowHandler).CheckSubWorkflowStatus(0xc00af11c50|github.com/flyteorg/flytepropeller/pkg/controller/nodes/subworkflow.(*subworkflowHandler).CheckSubWorkflowStatus(0xc00af11c50>?, {0x2796db0, 0xc01870a420}, {0x27a4130?, 0xc001082540?})
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/subworkflow/subworkflow.go:226 +0x3f1
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes/subworkflow.(*workflowNodeHandler).Handle(0xc000490c40|github.com/flyteorg/flytepropeller/pkg/controller/nodes/subworkflow.(*workflowNodeHandler).Handle(0xc000490c40>, {0x2796db0, 0xc01870a420}, {0x27a4130?, 0xc001082540?})
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/subworkflow/handler.go:91 +0x1690
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).execute(0xc00131e240|github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).execute(0xc00131e240>, {0x2796db0, 0xc01870a420}, {0x2798758, 0xc000490c40}, 0xc001082540, {0x27ab1b8?, 0xc01bb53d40?})
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/executor.go:382 +0x157
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).handleQueuedOrRunningNode(0xc00131e240|github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).handleQueuedOrRunningNode(0xc00131e240>, {0x2796db0, 0xc01870a420}, 0xc001082540, {0x2798758?, 0xc000490c40?})
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/executor.go:512 +0x227
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).handleNode(0xc00131e240|github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).handleNode(0xc00131e240>, {0x2796db0, 0xc01870a420}, {0x277d258, 0xc002cae500}, 0xc001082540, {0x2798758?, 0xc000490c40})
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/executor.go:736 +0x3c5
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).RecursiveNodeHandler(0xc00131e240|github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).RecursiveNodeHandler(0xc00131e240>, {0x2796db0, 0xc01870a0c0}, {0x27a63e8, 0xc01ac29360}, {0x277d258, 0xc002cae500}, {0x277d280?, 0xc002cae500?}, {0x27a3810, ...})
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/executor.go:934 +0x705
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).handleDownstream(0x22f4f2d|github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).handleDownstream(0x22f4f2d>?, {0x2796db0, 0xc01870a0c0}, {0x27a63e8, 0xc01ac29360}, {0x277d258, 0xc002cae500?}, {0x277d280?, 0xc002cae500}, {0x27a3810, ...})
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/executor.go:774 +0x3c5
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).RecursiveNodeHandler(0xc00131e240|github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).RecursiveNodeHandler(0xc00131e240>, {0x2796db0, 0xc01870a0c0}, {0x27a63e8, 0xc01ac29360}, {0x277d258, 0xc002cae500}, {0x277d280?, 0xc002cae500?}, {0x27a3810, ...})
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/executor.go:941 +0x935
<http://github.com/flyteorg/flytepropeller/pkg/controller/workflow.(*workflowExecutor).handleRunningWorkflow(0xc000491e30|github.com/flyteorg/flytepropeller/pkg/controller/workflow.(*workflowExecutor).handleRunningWorkflow(0xc000491e30>, {0x2796db0, 0xc01870a0c0}, 0xc002cae500)
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/workflow/executor.go:147 +0x1b3
<http://github.com/flyteorg/flytepropeller/pkg/controller/workflow.(*workflowExecutor).HandleFlyteWorkflow(0xc000491e30|github.com/flyteorg/flytepropeller/pkg/controller/workflow.(*workflowExecutor).HandleFlyteWorkflow(0xc000491e30>, {0x2796db0, 0xc01870a0c0}, 0xc002cae500)
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/workflow/executor.go:393 +0x40f
<http://github.com/flyteorg/flytepropeller/pkg/controller.(*Propeller).TryMutateWorkflow.func2(0xc00145f0e0|github.com/flyteorg/flytepropeller/pkg/controller.(*Propeller).TryMutateWorkflow.func2(0xc00145f0e0>, {0x2796db0, 0xc01870a0c0}, 0xc010bf3848, 0x1e51040?)
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/handler.go:130 +0x18e
<http://github.com/flyteorg/flytepropeller/pkg/controller.(*Propeller).TryMutateWorkflow(0xc00145f0e0|github.com/flyteorg/flytepropeller/pkg/controller.(*Propeller).TryMutateWorkflow(0xc00145f0e0>, {0x2796db0, 0xc0186b5230}, 0xc002156f00)
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/handler.go:131 +0x459
<http://github.com/flyteorg/flytepropeller/pkg/controller.(*Propeller).Handle(0xc00145f0e0|github.com/flyteorg/flytepropeller/pkg/controller.(*Propeller).Handle(0xc00145f0e0>, {0x2796db0, 0xc0186b5230}, {0xc004980330, 0x19}, {0xc00498034a, 0x8})
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/handler.go:205 +0x86d
<http://github.com/flyteorg/flytepropeller/pkg/controller.(*WorkerPool).processNextWorkItem.func1(0xc00189ac60|github.com/flyteorg/flytepropeller/pkg/controller.(*WorkerPool).processNextWorkItem.func1(0xc00189ac60>, 0xc010bf3f28, {0x1e51040?, 0xc003db8440})
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/workers.go:88 +0x510
<http://github.com/flyteorg/flytepropeller/pkg/controller.(*WorkerPool).processNextWorkItem(0xc00189ac60|github.com/flyteorg/flytepropeller/pkg/controller.(*WorkerPool).processNextWorkItem(0xc00189ac60>, {0x2796db0, 0xc0186b5230})
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/workers.go:99 +0xf1
<http://github.com/flyteorg/flytepropeller/pkg/controller.(*WorkerPool).runWorker(0x2796db0|github.com/flyteorg/flytepropeller/pkg/controller.(*WorkerPool).runWorker(0x2796db0>?, {0x2796db0, 0xc0007cda10})
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/workers.go:115 +0xbd
<http://github.com/flyteorg/flytepropeller/pkg/controller.(*WorkerPool).Run.func1()|github.com/flyteorg/flytepropeller/pkg/controller.(*WorkerPool).Run.func1()>
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/workers.go:150 +0x59
created by <http://github.com/flyteorg/flytepropeller/pkg/controller.(*WorkerPool).Run|github.com/flyteorg/flytepropeller/pkg/controller.(*WorkerPool).Run>
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/workers.go:147 +0x285
]
Line 94 of executor.go doesn't seem that interesting to me yet... there is already a nil check there on the err object
state.IndexesToCache.IsSet(uint(i))
in go/tasks/plugins/array/core/metadata.go?
Let me see what sets the length of that array
I'm too bad at go to figure out why the
...
empty parameter symbol is used in the stack trace for
<http://github.com/flyteorg/flytestdlib/bitarray.(*BitSet).IsSet(...)|github.com/flyteorg/flytestdlib/bitarray.(*BitSet).IsSet(...)>
Could that be because IndexesToCache itself is nil? So the integer for position isn't used yet? (Edit: err I think this is also caused when the function gets inlined by the compiler)
I think this gets set by state.SetIndexesToCache in one of a few places in go/tasks/plugins/array/catalog.go
Gonna look for
Failing to lookup catalog. Will move on to launching the task.
in the logs, and maybe some of the surrounding propeller code in catalog.go to see if I can tell what happened leading up to this... I know datacatalog was giving some errors at this time
That particular propeller log I don't see for this failed execution ( I see it many hours before for an unrelated map task much earlier in the DAG)
k
Ohh man this is unexpected
Cc @Dan Rammer (hamersaw)
d
Yeah, I have this on my list this morning. Hopefully I will get to it in an hour or so. Agreed though, very unexpected. We'll figure it out - thanks for such in-depth information @Calvin Leather!
c
Thank you!
This error persists across resumes as well
We're focusing on this as well if you need any additional info
k
Can you file an issue. The data is great
c
Indeed, will do for this one too
169 Views