dry-umbrella-88669
05/16/2023, 9:38 PMthankful-minister-83577
dry-umbrella-88669
05/17/2023, 3:24 PMWorkflow[apfelstrudel:development:apfelstrudel.flyte.workflows.search_workflow_fragmentlevel.hpo_deeplearning_cpu] failed. RuntimeExecutionError: max number of system retry attempts [51/50] exhausted. Last known status message: failed at Node[n1]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [k8s-array]: panic when executing a plugin [k8s-array]. Stack: [goroutine 232 [running]:
runtime/debug.Stack()
/usr/local/go/src/runtime/debug/stack.go:24 +0x65
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes/task.Handler.invokePlugin.func1.1()|github.com/flyteorg/flytepropeller/pkg/controller/nodes/task.Handler.invokePlugin.func1.1()>
/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/handler.go:390 +0xfe
panic({0x2276320, 0x3f6e0d0})
/usr/local/go/src/runtime/panic.go:838 +0x207
<http://github.com/flyteorg/flytestdlib/bitarray.(*BitSet).IsSet(...)|github.com/flyteorg/flytestdlib/bitarray.(*BitSet).IsSet(...)>
/go/pkg/mod/github.com/flyteorg/flytestdlib@v1.0.15/bitarray/bitset.go:33
<http://github.com/flyteorg/flyteplugins/go/tasks/plugins/array/core.InitializeExternalResources({0x2b96d18|github.com/flyteorg/flyteplugins/go/tasks/plugins/array/core.InitializeExternalResources({0x2b96d18>, 0xc01dd23a70}, {0x2ba3080?, 0xc00b0aa2c0?}, 0xc01652ab40, 0x279b758)
/go/pkg/mod/github.com/flyteorg/flyteplugins@v1.0.45/go/tasks/plugins/array/core/metadata.go:33 +0x1e1
<http://github.com/flyteorg/flyteplugins/go/tasks/plugins/array/k8s.Executor.Handle({{0x7f74cd1c7128|github.com/flyteorg/flyteplugins/go/tasks/plugins/array/k8s.Executor.Handle({{0x7f74cd1c7128>, 0xc0007ae680}, {{0x2b89030, 0xc004753760}}, {{0x2b89030, 0xc004753810}}}, {0x2b96d18, 0xc01dd23a70}, {0x2ba3080, 0xc00b0aa2c0})
/go/pkg/mod/github.com/flyteorg/flyteplugins@v1.0.45/go/tasks/plugins/array/k8s/executor.go:96 +0x268
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes/task.Handler.invokePlugin.func1(0x1|github.com/flyteorg/flytepropeller/pkg/controller/nodes/task.Handler.invokePlugin.func1(0x1>?, {0x2b96d18, 0xc01dd236e0}, {0x2b99458?, 0xc0048ee810?}, 0x0?)
/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/handler.go:397 +0x178
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes/task.Handler.invokePlugin({{0x2b98958|github.com/flyteorg/flytepropeller/pkg/controller/nodes/task.Handler.invokePlugin({{0x2b98958>, 0xc0014c90f8}, {0x2b83398, 0xc0014baa60}, 0xc001683920, 0xc001683950, 0xc001683980, {0x2b98998, 0xc020838640}, 0xc0003a4000, ...}, ...)
/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/handler.go:399 +0x9a
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes/task.Handler.Handle({{0x2b98958|github.com/flyteorg/flytepropeller/pkg/controller/nodes/task.Handler.Handle({{0x2b98958>, 0xc0014c90f8}, {0x2b83398, 0xc0014baa60}, 0xc001683920, 0xc001683950, 0xc001683980, {0x2b98998, 0xc020838640}, 0xc0003a4000, ...}, ...)
/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/handler.go:672 +0x1de5
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes/dynamic.dynamicNodeTaskNodeHandler.handleParentNode({{0x2b9a388|github.com/flyteorg/flytepropeller/pkg/controller/nodes/dynamic.dynamicNodeTaskNodeHandler.handleParentNode({{0x2b9a388>, 0xc0004000d0}, {{0xc0005b29c0, {{...}, 0x0}, {0xc000fa4000, 0x4, 0x4}}, {0xc0005b29e0, {{...}, ...}, ...}, ...}, ...}, ...)
/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/dynamic/handler.go:70 +0xd8
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes/dynamic.dynamicNodeTaskNodeHandler.Handle({{0x2b9a388|github.com/flyteorg/flytepropeller/pkg/controller/nodes/dynamic.dynamicNodeTaskNodeHandler.Handle({{0x2b9a388>, 0xc0004000d0}, {{0xc0005b29c0, {{...}, 0x0}, {0xc000fa4000, 0x4, 0x4}}, {0xc0005b29e0, {{...}, ...}, ...}, ...}, ...}, ...)
/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/dynamic/handler.go:224 +0x9d0
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).execute(0xc0003e20c0|github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).execute(0xc0003e20c0>, {0x2b96d18, 0xc01dd23050}, {0x2b98798, 0xc000a38280}, 0xc0318a0000, {0x2baced0?, 0xc004faa4e0?})
/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/executor.go:460 +0x157
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).handleQueuedOrRunningNode(0xc0003e20c0|github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).handleQueuedOrRunningNode(0xc0003e20c0>, {0x2b96d18, 0xc01dd23050}, 0xc0318a0000, {0x2b98798?, 0xc000a38280?})
/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/executor.go:593 +0x227
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).handleNode(0xc0003e20c0|github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).handleNode(0xc0003e20c0>, {0x2b96d18, 0xc01dd23050}, {0x2b7b760, 0xc00a42cf00}, 0xc0318a0000, {0x2b98798?, 0xc000a38280})
/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/executor.go:820 +0x3c5
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).RecursiveNodeHandler(0xc0003e20c0|github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).RecursiveNodeHandler(0xc0003e20c0>, {0x2b96d18, 0xc01dd22a20}, {0x2ba7398, 0xc043119770}, {0x2b7b760, 0xc00a42cf00}, {0x2b96ff0?, 0xc00a42cf00?}, {0x2ba4d50, ...})
/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/executor.go:1018 +0x705
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).handleDownstream(0x26ae306|github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).handleDownstream(0x26ae306>?, {0x2b96d18, 0xc01dd22a20}, {0x2ba7398, 0xc043119770}, {0x2b7b760, 0xc00a42cf00?}, {0x2b96ff0?, 0xc00a42cf00}, {0x2ba4d50, ...})
/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/executor.go:858 +0x3c5
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).RecursiveNodeHandler(0xc0003e20c0|github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).RecursiveNodeHandler(0xc0003e20c0>, {0x2b96d18, 0xc01dd22a20}, {0x2ba7398, 0xc043119770}, {0x2b7b760, 0xc00a42cf00}, {0x2b96ff0?, 0xc00a42cf00?}, {0x2ba4d50, ...})
/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/executor.go:1025 +0x935
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).handleDownstream(0x26ae306|github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).handleDownstream(0x26ae306>?, {0x2b96d18, 0xc01dd22a20}, {0x2ba7398, 0xc043119770}, {0x2b7b760, 0xc00a42cf00?}, {0x2b96ff0?, 0xc00a42cf00}, {0x2ba4d50, ...})
/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/executor.go:858 +0x3c5
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).RecursiveNodeHandler(0xc0003e20c0|github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).RecursiveNodeHandler(0xc0003e20c0>, {0x2b96d18, 0xc01dd22a20}, {0x2ba7398, 0xc043119770}, {0x2b7b760, 0xc00a42cf00}, {0x2b96ff0?, 0xc00a42cf00?}, {0x2ba4d50, ...})
/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/executor.go:1025 +0x935
<http://github.com/flyteorg/flytepropeller/pkg/controller/workflow.(*workflowExecutor).handleRunningWorkflow(0xc0008a9dc0|github.com/flyteorg/flytepropeller/pkg/controller/workflow.(*workflowExecutor).handleRunningWorkflow(0xc0008a9dc0>, {0x2b96d18, 0xc01dd22a20}, 0xc00a42cf00)
/go/src/github.com/flyteorg/flytepropeller/pkg/controller/workflow/executor.go:147 +0x1b3
<http://github.com/flyteorg/flytepropeller/pkg/controller/workflow.(*workflowExecutor).HandleFlyteWorkflow(0xc0008a9dc0|github.com/flyteorg/flytepropeller/pkg/controller/workflow.(*workflowExecutor).HandleFlyteWorkflow(0xc0008a9dc0>, {0x2b96d18, 0xc01dd22a20}, 0xc00a42cf00)
/go/src/github.com/flyteorg/flytepropeller/pkg/controller/workflow/executor.go:393 +0x40f
<http://github.com/flyteorg/flytepropeller/pkg/controller.(*Propeller).TryMutateWorkflow.func2(0xc000b93300|github.com/flyteorg/flytepropeller/pkg/controller.(*Propeller).TryMutateWorkflow.func2(0xc000b93300>, {0x2b96d18, 0xc01dd22a20}, 0xc00567b7d0, 0x2130000?)
/go/src/github.com/flyteorg/flytepropeller/pkg/controller/handler.go:142 +0x18e
<http://github.com/flyteorg/flytepropeller/pkg/controller.(*Propeller).TryMutateWorkflow(0xc000b93300|github.com/flyteorg/flytepropeller/pkg/controller.(*Propeller).TryMutateWorkflow(0xc000b93300>, {0x2b96d18, 0xc01dd225d0}, 0xc013aa6500)
/go/src/github.com/flyteorg/flytepropeller/pkg/controller/handler.go:143 +0x495
<http://github.com/flyteorg/flytepropeller/pkg/controller.(*Propeller).Handle(0xc000b93300|github.com/flyteorg/flytepropeller/pkg/controller.(*Propeller).Handle(0xc000b93300>, {0x2b96d18, 0xc01dd225d0}, {0xc01764c9c0, 0x18}, {0xc01764c9d9, 0x14})
/go/src/github.com/flyteorg/flytepropeller/pkg/controller/handler.go:259 +0xe4a
<http://github.com/flyteorg/flytepropeller/pkg/controller.(*WorkerPool).processNextWorkItem.func1(0xc00032b9e0|github.com/flyteorg/flytepropeller/pkg/controller.(*WorkerPool).processNextWorkItem.func1(0xc00032b9e0>, 0xc00567bf28, {0x2130000?, 0xc045a12cc0})
/go/src/github.com/flyteorg/flytepropeller/pkg/controller/workers.go:88 +0x510
<http://github.com/flyteorg/flytepropeller/pkg/controller.(*WorkerPool).processNextWorkItem(0xc00032b9e0|github.com/flyteorg/flytepropeller/pkg/controller.(*WorkerPool).processNextWorkItem(0xc00032b9e0>, {0x2b96d18, 0xc01dd225d0})
/go/src/github.com/flyteorg/flytepropeller/pkg/controller/workers.go:99 +0xf1
<http://github.com/flyteorg/flytepropeller/pkg/controller.(*WorkerPool).runWorker(0x2b96d18|github.com/flyteorg/flytepropeller/pkg/controller.(*WorkerPool).runWorker(0x2b96d18>?, {0x2b96d18, 0xc019212030})
/go/src/github.com/flyteorg/flytepropeller/pkg/controller/workers.go:115 +0xbd
<http://github.com/flyteorg/flytepropeller/pkg/controller.(*WorkerPool).Run.func1()|github.com/flyteorg/flytepropeller/pkg/controller.(*WorkerPool).Run.func1()>
/go/src/github.com/flyteorg/flytepropeller/pkg/controller/workers.go:150 +0x59
created by <http://github.com/flyteorg/flytepropeller/pkg/controller.(*WorkerPool).Run|github.com/flyteorg/flytepropeller/pkg/controller.(*WorkerPool).Run>
/go/src/github.com/flyteorg/flytepropeller/pkg/controller/workers.go:147 +0x285
]
is the error at the top of the workflow panel and:
max number of system retry attempts [51/50] exhausted - system failure.
for the map task side paneldry-umbrella-88669
05/17/2023, 3:26 PM> kubectl describe po | grep Image:
Image: <http://gcr.io/cloudsql-docker/gce-proxy:1.14|gcr.io/cloudsql-docker/gce-proxy:1.14>
Image: <http://gcr.io/cloudsql-docker/gce-proxy:1.14|gcr.io/cloudsql-docker/gce-proxy:1.14>
Image: <http://gcr.io/cloudsql-docker/gce-proxy:1.14|gcr.io/cloudsql-docker/gce-proxy:1.14>
Image: <http://cr.flyte.org/flyteorg/datacatalog-release:v1.4.3|cr.flyte.org/flyteorg/datacatalog-release:v1.4.3>
Image: <http://cr.flyte.org/flyteorg/datacatalog-release:v1.4.3|cr.flyte.org/flyteorg/datacatalog-release:v1.4.3>
Image: <http://cr.flyte.org/flyteorg/flyteadmin-release:v1.4.3|cr.flyte.org/flyteorg/flyteadmin-release:v1.4.3>
Image: <http://cr.flyte.org/flyteorg/flyteadmin-release:v1.4.3|cr.flyte.org/flyteorg/flyteadmin-release:v1.4.3>
Image: <http://cr.flyte.org/flyteorg/flyteadmin-release:v1.4.3|cr.flyte.org/flyteorg/flyteadmin-release:v1.4.3>
Image: <http://cr.flyte.org/flyteorg/flyteadmin-release:v1.4.3|cr.flyte.org/flyteorg/flyteadmin-release:v1.4.3>
Image: <http://cr.flyte.org/flyteorg/flyteconsole-release:v1.4.3|cr.flyte.org/flyteorg/flyteconsole-release:v1.4.3>
Image: <http://cr.flyte.org/flyteorg/flytepropeller:v1.1.78|cr.flyte.org/flyteorg/flytepropeller:v1.1.78>
hallowed-mouse-14616
05/17/2023, 7:54 PMarray size > max allowed. requested [%v]. allowed [%v]
? My guess is that this lookup phase is returning an error which is not caught here. will submit a bug fix soon - but want to see if we can workaround.thankful-minister-83577
thankful-minister-83577
thankful-minister-83577
dry-umbrella-88669
05/17/2023, 8:08 PMhallowed-mouse-14616
05/17/2023, 8:35 PMmaxArrayJobSize
value at something like plugins.k8s-array.maxArrayJobSize
and then other specific configuration if caching is enabled (is it?). The default value is like 5000 but I'm sure you've increased it.hallowed-mouse-14616
05/17/2023, 8:54 PMmaxArrayJobSize
parameter.dry-umbrella-88669
05/17/2023, 10:46 PMdry-umbrella-88669
05/17/2023, 10:48 PMhallowed-mouse-14616
05/18/2023, 8:07 PMdry-umbrella-88669
05/18/2023, 8:25 PMhallowed-mouse-14616
05/22/2023, 8:06 PM