clean-glass-36808
04/30/2025, 4:47 PMLast known status message: AlreadyExists: Event Already Exists, caused by [event has already been sent]
I can''t tell if this indicates a larger issue or if Flyte Propeller should just be updated to more gracefully handle AlreadyExists
. Going to dig deeper into this to understand if DB state was updated in Flyte Admin but maybe gRPC call failed the first time the event was sent..clean-glass-36808
04/30/2025, 4:58 PMcareful-australia-19356
05/08/2025, 9:10 PMAlready Exists
but there could be associated with transient communication issues between propeller and admin or propeller and the k8s informer. MapTasks are a common themeclean-glass-36808
05/09/2025, 8:50 AMcreamy-piano-60645
05/15/2025, 12:27 PMErrorOnAlreadyExists
for events: https://github.com/flyteorg/flyte/blob/master/flytepropeller/pkg/controller/nodes/array/handler.go#L746
It looks like this was intentional so ArrayNode sub-node state is not lost: https://github.com/flyteorg/flyte/pull/5680. During the failed workflows we see the event sink retries fail continuously:
"msg": "Event version already exists, bumping version and retrying (1/3): [AlreadyExists: Event Already Exists, caused by [event has already been sent]]",
"msg": "Event version already exists, bumping version and retrying (2/3): [AlreadyExists: Event Already Exists, caused by [event has already been sent]]",
"msg": "Event version already exists, bumping version and retrying (3/3): [AlreadyExists: Event Already Exists, caused by [event has already been sent]]",
"msg": "Event version already exists, bumping version and retrying (4/3): [AlreadyExists: Event Already Exists, caused by [event has already been sent]]",
...
This presumably repeats until the workflow fails out completely:
Workflow[...] failed. RuntimeExecutionError: max number of system retry attempts [31/30] exhausted. Last known status message: AlreadyExists: Event Already Exists, caused by [event has already been sent]
One question I have is around how 3 retries
was chosen? Is it possible the TaskPhaseVersion
is > 3 versions out of sync? Going to continue looking into this more but any additional context or guidance would be appreciated 🙇creamy-piano-60645
05/15/2025, 12:45 PME0430 15:32:53.086687 1 workers.go:103] error syncing '...': Operation cannot be fulfilled on flyteworkflows.flyte.lyft.com "...": the object has been modified; please apply your changes to the latest version and try again
average-finland-92144
05/15/2025, 8:26 PMclean-glass-36808
05/17/2025, 1:16 PM