Hello flyte team, we are encountering the followin...
# ask-the-community
m
Hello flyte team, we are encountering the following error when running one of our larger map-tasks. We did not encounter this error on previous map-tasks but we are seeing this reproducibly (last 2 runs) now. Any thoughts on the cause?
eventually it hits the system retry limit of 50 and crashes
s
cc: @Dan Rammer (hamersaw)
đź‘€ 1
d
@Mike Zhong it sounds like this only happening on a larger map task and no others? Did you recently upgrade any components?
/go/src/github.com/flyterog/flytepropeller/pkg/controller/nodes/task/handler.go:487
this line has to do with checking for existence of the newer Flyte Deck stuff. but it is peculiar that this would only fail in a single instance.
m
we have not recently updated any components. This particular map task, in our test setting, fanned out 500 tasks
but we have other map tasks in our “pipeline” which fan out to a greater or similar degree
those did not encounter this error
we are in the process of adding additional logging, and enabling cache (most of our other tasks are cache enabled)
d
Sure, what version of FlytePropeller do you have running?
m
looks like v1.1.0
✔️ 1
d
cc @Kevin Su looks like this is the line that panics, thoughts?
it seems like there may be a missing nil check in there somewhere.
k
@Mike Zhong Any other logs you have? (like the log of map task ). I tried to run a map task fanned out 1000 tasks, but didn’t get any error.
Update: I tried to run larger map task fanned out 10000 tasks, and got below error. using the same example above but I changed the input to 10000. it panics at this line. cc @Dan Rammer (hamersaw)
Copy code
/usr/local/go/src/runtime/debug/stack.go:24 +0x65
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes/task.Handler.invokePlugin.func1.1()|github.com/flyteorg/flytepropeller/pkg/controller/nodes/task.Handler.invokePlugin.func1.1()>
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/handler.go:375 +0xfe
panic({0x1f45600, 0x3959500})
	/usr/local/go/src/runtime/panic.go:838 +0x207
<http://github.com/flyteorg/flytestdlib/bitarray.(*BitSet).IsSet(...)|github.com/flyteorg/flytestdlib/bitarray.(*BitSet).IsSet(...)>
	/go/pkg/mod/github.com/flyteorg/flytestdlib@v1.0.4/bitarray/bitset.go:33
<http://github.com/flyteorg/flyteplugins/go/tasks/plugins/array/core.InitializeExternalResources({0x279ca70|github.com/flyteorg/flyteplugins/go/tasks/plugins/array/core.InitializeExternalResources({0x279ca70>, 0xc006688a50}, {0x27a8360?, 0xc00154d760?}, 0xc005fd3440, 0x23d7cd8)
	/go/pkg/mod/github.com/flyteorg/flyteplugins@v1.0.5/go/tasks/plugins/array/core/metadata.go:33 +0x1e1
<http://github.com/flyteorg/flyteplugins/go/tasks/plugins/array/k8s.Executor.Handle({{0x7f33215a0ff0|github.com/flyteorg/flyteplugins/go/tasks/plugins/array/k8s.Executor.Handle({{0x7f33215a0ff0>, 0xc000a7e380}, {{0x278fa10, 0xc00174a0b0}}, {{0x278fa10, 0xc00174a160}}}, {0x279ca70, 0xc006688a50}, {0x27a8360, 0xc00154d760})
	/go/pkg/mod/github.com/flyteorg/flyteplugins@v1.0.5/go/tasks/plugins/array/k8s/executor.go:94 +0x225
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes/task.Handler.invokePlugin.func1(0x0|github.com/flyteorg/flytepropeller/pkg/controller/nodes/task.Handler.invokePlugin.func1(0x0>?, {0x279ca70, 0xc006688810}, {0x279ef58?, 0xc000bfe240?}, 0x0?)
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/handler.go:382 +0x178
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes/task.Handler.invokePlugin({{0x279cdb8|github.com/flyteorg/flytepropeller/pkg/controller/nodes/task.Handler.invokePlugin({{0x279cdb8>, 0xc000a5d8d8}, {0x278a5a8, 0xc0009b4aa0}, 0xc0009c7260, 0xc0009c7290, 0xc0009c72c0, {0x279e4d8, 0xc0007f8000}, 0xc0009fc000, ...}, ...)
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/handler.go:384 +0x9a
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes/task.Handler.Handle({{0x279cdb8|github.com/flyteorg/flytepropeller/pkg/controller/nodes/task.Handler.Handle({{0x279cdb8>, 0xc000a5d8d8}, {0x278a5a8, 0xc0009b4aa0}, 0xc0009c7260, 0xc0009c7290, 0xc0009c72c0, {0x279e4d8, 0xc0007f8000}, 0xc0009fc000, ...}, ...)
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/handler.go:616 +0x182b
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes/dynamic.dynamicNodeTaskNodeHandler.handleParentNode({{0x279fe08|github.com/flyteorg/flytepropeller/pkg/controller/nodes/dynamic.dynamicNodeTaskNodeHandler.handleParentNode({{0x279fe08>, 0xc0009f00d0}, {{0xc000c86160, {{...}, 0x0}, {0xc000484080, 0x4, 0x4}}, {0xc000c86180, {{...}, ...}, ...}, ...}, ...}, ...)
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/dynamic/handler.go:70 +0xd8
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes/dynamic.dynamicNodeTaskNodeHandler.Handle({{0x279fe08|github.com/flyteorg/flytepropeller/pkg/controller/nodes/dynamic.dynamicNodeTaskNodeHandler.Handle({{0x279fe08>, 0xc0009f00d0}, {{0xc000c86160, {{...}, 0x0}, {0xc000484080, 0x4, 0x4}}, {0xc000c86180, {{...}, ...}, ...}, ...}, ...}, ...)
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/dynamic/handler.go:220 +0x9d0
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).execute(0xc0009e0000|github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).execute(0xc0009e0000>, {0x279ca70, 0xc006688330}, {0x279e358, 0xc000734000}, 0xc00808dec0, {0x27b0d58?, 0xc008983110?})
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/executor.go:382 +0x157
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).handleQueuedOrRunningNode(0xc0009e0000|github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).handleQueuedOrRunningNode(0xc0009e0000>, {0x279ca70, 0xc006688330}, 0xc00808dec0, {0x279e358?, 0xc000734000?})
	/go/src/github.com/flyteorg/flytepropeller/pkg/controller/nodes/executor.go:512 +0x227
<http://github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).handleNode(0xc0009e0000|github.com/flyteorg/flytepropeller/pkg/controller/nodes.(*nodeExecutor).handleNode(0xc0009e0000>, {0x279ca70, 0xc006688330}, {0x2782f10, 0xc005f9cf00}, 0xc00808dec0, {0x279e358?, 0xc000734000})
d
@Kevin Su aws-batch or k8s-array plugin? This doesn't seem related to Mike's issue. but we should still resolve. Create an issue?
m
Hi @Kevin Su . Here is the log for one of the mapped tasks, unfortunately it’s not particularly helpful, we didn’t have the logger set so we don’t have an indication where in our task it failed, but we suspect it failed after it completed. I’m not sure what
panic when reconciling workflow
means but if you point me to what could throw that error, I could dig a little more
that error you see is handled, it’s more of a warning that we are adding handlers to a non-root logger
k
I just created a PR to fix it, the problem is that tCtx.ow.GetReader() is nil when running map tasks with no output, and it causes nil pointer dereference panic. https://github.com/flyteorg/flytepropeller/pull/465
đź‘€ 1
m
interesting root cause, I looked at our mapped out task and we do return an
int
. It’s not captured or used though. We ran into an issue where trying to use
.with_overrides()
with a
map_task(task)
where
task
returns nothing fails to compile with
VoidPromise has no attribute with_overrides
. We suspected there was something different between
VoidPromise
and
Promise
so we made sure all our map tasks returned something, even if it is just a sentinel value. I’d like to see if this fix resolves our issue
đź‘Ť 1
d
@Mike Zhong it sounds like there may be another bug there with flytekit construction of map tasks. I'm not sure if
with_overrides
is currently supported, but it probably should be. @Kevin Su thanks for the backend fix, do you know anything about the flytekit side?
k
@Mike Zhong I just created a PR to support overriding the resource of voidPromise. https://github.com/flyteorg/flytekit/pull/1127
đź‘€ 1
163 Views