jolly-florist-66728
10/21/2024, 7:42 PMRuntimeExecutionError: max number of system retry attempts [31/30] exhausted. Last known status message: failed at Node[dn0]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [k8s-array]:
Operation cannot be fulfilled on pods "f82e2be5d27cc00fc000-n7-0-dn0-0-2427": the object has been modified; please apply your changes to the latest version and try again
Flyte backend: 1.13.1
# Answer
This error typically occurs when there are concurrent modifications to a Kubernetes resource, such as a pod, which leads to conflicts when Flyte tries to update it. The error message indicates that the system attempted to apply changes based on an outdated version of the pod, which Kubernetes rejected to prevent unintended overwrites. This is a common issue in systems interacting with Kubernetes resources under concurrent modifications.
According to Flyte GitHub Issue #5273, a potential solution involves implementing a mechanism to fetch the latest pod state from the Kubernetes API before retrying operations. This ensures that Flyte operates on the current state of the pod. Adding a Pod.Get()
operation to refresh the pod's state in the Flyte system before attempting retries or updates might help resolve this issue.
Would you like more detailed guidance on how to potentially implement this fix or further investigate this issue in the Flyte codebase?
Sources:
- Flyte GitHub Issue #5273
Slack Link
https://.slack.com/archives/CP2HDHKE1/p1729539720.564639 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.