Blake Jackson
11/13/2023, 7:15 PMv1.10.0
and are seeing flyteadmin CPU hover around 99% consistently. We've increased the CPU multiple times, each time resulting in the same behavior. Currently, we have the CPU set to 3
, but what's strange is that we were running before(v1.9.0
) on less 500m
. Does anyone have any thoughts as to what could be causing this? Should we look into anything in particular to performance tune? Is there a recommended resources config someone could point me to?kubectl top
NAME CPU(cores) MEMORY(bytes)
flyteadmin-c75645575-qprvl 3229m 421Mi
flyteadmin-c75645575-xcq8b 2m 67Mi
Ketan (kumare3)
Blake Jackson
11/14/2023, 2:17 AMKetan (kumare3)
Eduardo Apolinario (eapolinario)
11/14/2023, 3:22 AMBlake Jackson
11/14/2023, 2:23 PMSLOW SQL >= 200ms
warning and the logs were much more populated because of that.
During this same time period, the DB performed nominally, showing hardly any load and the top waits were minimal on SELECT * FROM "tags" WHERE ("tags"."artifact_id","tags"."dataset_uuid") IN (($1,$2))
The strangest thing about this whole thing is that we recreated pods multiple times and still every time the new pod used 100% CPU. It gets stranger because as I was profiling, all the CPU finally dropped (see image). I do have one cpu profile from before and one from this morning that I can share, but unfortunately, nothing stands out to me. I'm attaching the flame graphs as images as well.Ketan (kumare3)
Yee
Blake Jackson
11/14/2023, 3:09 PMUPDATE "executions" ...
or do you need to see the entire SQL statement? If the former, I can grab a few more. If the latter, I need to confirm there's nothing in there I can't shareSELECT * FROM "task_executions" WHERE "task_executions"."project" = ...
UPDATE "node_executions" SET ...
SELECT * FROM "executions" WHERE "executions"."execution_project"...
{grpc_method="GetExecutionData", grpc_service="flyteidl.service.AdminService"}
and {grpc_method="GetExecution", grpc_service="flyteidl.service.AdminService"}
were higher