shy-holiday-15500
07/26/2022, 7:20 PM[0]: code:"ResourceDeletedExternally" message:"resource not found, name [e2e-workflows-development/fb2xnzxy-n2-0-0]. reason: pods \"fb2xnzxy-n2-0-0\" not found"
We then checked control plan logs and they suggested the pod was being evicted due to memory pressure (137 = k8s OOM status code):
"containerStatuses": [
{
"name": "fb2xnzxy-n2-0-0",
"state": {
"terminated": {
"exitCode": 137,
....
However when we look at grafana, we see that memory used is really low, way below requests/limits... however, we found that the memory cache was quite high. We then found a k8s issue about memory cache being incorrectly counted as "used" memory by kubelet when it looks at memory pressure.
Note quite a flyte issue, more of a k8s issue, but the log was a bit mysterious and we're still figuring out resolution.shy-holiday-15500
07/26/2022, 7:20 PMnice-zebra-99977
07/26/2022, 7:21 PMshy-holiday-15500
07/26/2022, 7:24 PMfreezing-airport-6809
freezing-airport-6809
freezing-airport-6809
freezing-airport-6809
freezing-airport-6809
shy-holiday-15500
07/26/2022, 9:06 PMshy-holiday-15500
07/26/2022, 9:08 PMshy-holiday-15500
07/26/2022, 9:12 PMfreezing-airport-6809
high-accountant-32689
07/27/2022, 5:12 PMhallowed-mouse-14616
07/27/2022, 6:20 PMnice-zebra-99977
07/27/2022, 7:14 PMnice-zebra-99977
07/27/2022, 7:16 PMshy-holiday-15500
07/27/2022, 8:23 PMshy-holiday-15500
07/27/2022, 8:23 PMshy-holiday-15500
07/27/2022, 8:23 PMshy-holiday-15500
07/27/2022, 8:23 PMshy-holiday-15500
07/27/2022, 8:44 PMshy-holiday-15500
07/27/2022, 9:35 PM--memory
and appears to reserve a bunch of memory if you don't override defaults. We're still tracing the source code to figure out what this flag does (and why the ram is showing up as cached), but definitely not a flyte problem it seemsshy-holiday-15500
07/27/2022, 9:35 PMfreezing-airport-6809
freezing-airport-6809
freezing-airport-6809
shy-holiday-15500
07/27/2022, 9:44 PMshy-holiday-15500
07/27/2022, 9:44 PMshy-holiday-15500
07/27/2022, 9:44 PMshy-holiday-15500
07/27/2022, 9:44 PMshy-holiday-15500
07/27/2022, 9:44 PMshy-holiday-15500
07/27/2022, 9:45 PMfreezing-airport-6809
shy-holiday-15500
07/27/2022, 9:45 PMfreezing-airport-6809
freezing-airport-6809
shy-holiday-15500
07/27/2022, 9:46 PMthousands-area-8239
07/29/2022, 4:58 PMI0729 16:10:23.041121 4106 kuberuntime_manager.go:484] "No sandbox for pod can be found. Need to start a new one" pod="e2e-workflows-development/fnrte65a-n3-0-108"
but no other meaningful logs between that and the time it is deleted and removed.
We’ve included screenshots of the finalizers in the configmap and being applied to the pods, as well the ResourceDeletedExternally
error. Any thoughts on what could be happening here or where else we could look for insight?freezing-airport-6809
thousands-area-8239
07/29/2022, 6:24 PMResourceDeletedExternally
from the flyte console. We are assuming this is indicating that flytepropeller is unable to gather the logs and the pod is being cleaned up by the kubelet, despite the finalizershallowed-mouse-14616
07/29/2022, 6:35 PMshy-holiday-15500
07/29/2022, 7:14 PMshy-holiday-15500
08/18/2022, 11:24 AMfreezing-airport-6809
freezing-airport-6809
shy-holiday-15500
08/18/2022, 2:40 PMshy-holiday-15500
08/18/2022, 2:40 PMfreezing-airport-6809
shy-holiday-15500
08/18/2022, 4:03 PM