clean-glass-36808
04/30/2025, 4:47 PMLast known status message: AlreadyExists: Event Already Exists, caused by [event has already been sent]
I can''t tell if this indicates a larger issue or if Flyte Propeller should just be updated to more gracefully handle AlreadyExists. Going to dig deeper into this to understand if DB state was updated in Flyte Admin but maybe gRPC call failed the first time the event was sent..clean-glass-36808
04/30/2025, 4:58 PMcareful-australia-19356
05/08/2025, 9:10 PMAlready Exists but there could be associated with transient communication issues between propeller and admin or propeller and the k8s informer. MapTasks are a common themeclean-glass-36808
05/09/2025, 8:50 AMcreamy-piano-60645
05/15/2025, 12:27 PMErrorOnAlreadyExists for events: https://github.com/flyteorg/flyte/blob/master/flytepropeller/pkg/controller/nodes/array/handler.go#L746
It looks like this was intentional so ArrayNode sub-node state is not lost: https://github.com/flyteorg/flyte/pull/5680. During the failed workflows we see the event sink retries fail continuously:
"msg": "Event version already exists, bumping version and retrying (1/3): [AlreadyExists: Event Already Exists, caused by [event has already been sent]]",
"msg": "Event version already exists, bumping version and retrying (2/3): [AlreadyExists: Event Already Exists, caused by [event has already been sent]]",
"msg": "Event version already exists, bumping version and retrying (3/3): [AlreadyExists: Event Already Exists, caused by [event has already been sent]]",
"msg": "Event version already exists, bumping version and retrying (4/3): [AlreadyExists: Event Already Exists, caused by [event has already been sent]]",
...
This presumably repeats until the workflow fails out completely:
Workflow[...] failed. RuntimeExecutionError: max number of system retry attempts [31/30] exhausted. Last known status message: AlreadyExists: Event Already Exists, caused by [event has already been sent]
One question I have is around how 3 retries was chosen? Is it possible the TaskPhaseVersion is > 3 versions out of sync? Going to continue looking into this more but any additional context or guidance would be appreciated 🙇creamy-piano-60645
05/15/2025, 12:45 PME0430 15:32:53.086687 1 workers.go:103] error syncing '...': Operation cannot be fulfilled on flyteworkflows.flyte.lyft.com "...": the object has been modified; please apply your changes to the latest version and try againaverage-finland-92144
05/15/2025, 8:26 PMclean-glass-36808
05/17/2025, 1:16 PM