nice-market-38632
11/29/2024, 9:57 AM{"json":{"exec_id":"***masked-exec-id***","ns":"***masked-namespace***","res_ver":"***masked-ver***","routine":"worker-2","wf":"***masked-workflow-id***:***masked-workflow-id***:map_task.my_map_workflow"},"level":"error","msg":"Error when trying to reconcile workflow. Error [[]]. Error Type[*errors.WorkflowErrorWithCause]","ts":"2024-11-13T08:12:16Z"}
E1113 08:12:16.842540 1 workers.go:103] error syncing '***masked-namespace***/***masked-exec-id***': Workflow[] failed. ErrorRecordingError: failed to publish event, caused by: EventSinkError: Error sending event, caused by [rpc error: code = DeadlineExceeded desc = context deadline exceeded]
{"json":{"exec_id":"***masked-exec-id-2***","ns":"***masked-namespace***","res_ver":"***masked-ver-2***","routine":"worker-3","wf":"***masked-workflow-id***:***masked-workflow-id***:map_task.my_map_workflow"},"level":"warning","msg":"Event recording failed. Error [EventSinkError: Error sending event, caused by [rpc error: code = DeadlineExceeded desc = context deadline exceeded]]","ts":"2024-11-13T08:12:42Z"}
{"json":{"exec_id":"***masked-exec-id-2***","ns":"***masked-namespace***","res_ver":"***masked-ver-2***","routine":"worker-3","wf":"***masked-workflow-id***:***masked-workflow-id***:map_task.my_map_workflow"},"level":"error","msg":"Error when trying to reconcile workflow. Error [[]]. Error Type[*errors.WorkflowErrorWithCause]","ts":"2024-11-13T08:12:42Z"}
E1113 08:12:42.070995 1 workers.go:103] error syncing '***masked-namespace***/***masked-exec-id-2***': Workflow[] failed. ErrorRecordingError: failed to publish event, caused by: EventSinkError: Error sending event, caused by [rpc error: code = DeadlineExceeded desc = context deadline exceeded]
Basically it seemed like the connection b/w flyte-propeller and flyteadmin was broken maybe which caused these timeouts.
Doing a simple pod restart fixed it. This has happened 2-3 times and pod restart always fixed it.
Any suggestions how to fix this?
Couldn’t find a way to add “keepalive timeout” in the helm chart/docs.freezing-airport-6809
freezing-airport-6809
nice-market-38632
12/02/2024, 6:00 AMnice-market-38632
12/03/2024, 2:44 PMaverage-finland-92144
12/04/2024, 10:49 AMnice-market-38632
12/04/2024, 10:55 AM<https://pkg.go.dev/google.golang.org/grpc#WithKeepaliveParams>
but I didnt find it here:
https://github.com/flyteorg/flyte/blob/ba331fd493173682500bb1735bfa760715c64b23/flytepropeller/pkg/controller/controller.go#L315
there should be a with block here for keep alive config.