We sometimes see entities missing when trying to f...
# flyte-support
h
We sometimes see entities missing when trying to fetch them with flyte_remote.
Copy code
FlyteEntityNotExistException: USER:EntityNotExist: error=None, 
cause=<_InactiveRpcError of RPC that terminated with:
        status = StatusCode.NOT_FOUND
        details = "missing entity of type execution with identifier 
project:"wf-test" domain:"production" 
name:"b5e545e91107""
        debug_error_string = "UNKNOWN:Error received from peer  
{created_time:"2025-01-23T00:04:03.128079309+00:00", grpc_status:5, 
grpc_message:"missing entity of type execution with identifier 
project:\"wf-test\" domain:\"production\" 
name:\"b5e545e91107\""}"
when the entity definitely exists in the UI, and retrying fixes the issue. Any idea what might cause this flakiness?
The entity is also ~1 hour old, so it's not a race condition presumably
a
do you have logs from flytepropeller and/or flyteadmin? How big is your deployment in terms of # wrokflows/concurrent executions?
h
i have logs from propeller, I didn't see anything suspicious there.
there are maybe O(10) concurrent executions ? its not huge
Oh here i found something about this workflow in the propeller logs
Copy code
msg: "Failed to update workflow. Error [Put "<https://10.3.80.1:443/apis/flyte.lyft.com/v1alpha1/namespaces/production/flyteworkflows/b5e545e91107?timeout=30s>": unexpected EOF]"
a
It'd be good to get metrics from
kube-apiserver
because seems to be some latency there as propeller eventually fails to update the CRD. The timeout mentioned in the error is controlled with this setting: https://docs.flyte.org/en/latest/deployment/configuration/generated/flytepropeller_config.html#kube-client-config-config-kubeclientconfig You could increase timeout but again, understanding better what's happening in the API server could help.
h
Ok I will talk to our devops tomorrow to see if I can get that. Any particular stats that would be good to look at?
a
maybe
workqueue_depth
h
Thanks for the pointers