Hello. We're seeing an error when a dynamic task f...
# flyte-support
c
Hello. We're seeing an error when a dynamic task fans out to a high number of tasks.
Copy code
Workflow[...] failed. RuntimeExecutionError: max number of system retry attempts [31/30] exhausted. Last known status message: failed at Node[n0]. EventRecordingFailed: failed to record node event, caused by: EventSinkError: Error sending event, caused by [rpc error: code = Unknown desc = unexpected HTTP status code received from server: 413 (Request Entity Too Large); transport: received unexpected content-type "text/html"]
Would tweaking the configuration linked help solve this? https://docs.flyte.org/en/latest/deployment/configuration/performance.html#offloading-static-workflow-information-from-crd
c
unfortunately no. The NodeExecutionEvent (https://github.com/flyteorg/flyte/blob/master/flyteidl/protos/flyteidl/event/event.proto#L42-L133) is the proto message that's getting too big (because of the https://github.com/flyteorg/flyte/blob/master/flyteidl/protos/flyteidl/event/event.proto#L151C5-L151C32). We did some work recently to offload literals and it's in our internal roadmap to explore offloading of the dynamic closure, but unfortunately this is not a priority at the moment (so I can't promise a solid date yet)
h
@clean-glass-36808, I stand corrected, this feature ended up becoming a priority and it's being implemented in https://github.com/flyteorg/flyte/pull/6234 (which should go out in 1.15, which is being released in a few days),
this might not be enough for your use case though (btw, can you give more details about this dynamic task? Like its overall structure, number of tasks, etc)
c
I just came to circle back and report that I actually think this is our nginx ingress being upset. I don't think flyte admin is actually even seeing the request so we'll need to tweak our ingress settings.
h
Interesting. Yeah, this makes sense. Let us know, ok?
c
We run the data plane and control plane in different clusters so the traffic traverses ingress LB. Yeah will do
It was indeed an ingress issue. Resolved with ingress annotation
<http://nginx.ingress.kubernetes.io/proxy-body-size|nginx.ingress.kubernetes.io/proxy-body-size>: "100m"
sorry for the churn
h
No worries. Thanks for following up.