#CP2HDHKE1 Hello All, My flyte workflows are running on a k8s cluster. Workflow was 6 nodes and each node requests 1 CPU. What happens is that by the end of the workflow 6 nodes are requesting 6 CPU's. The workflow succeeds but the CPU is not released from request. This means that once I run this workflow like 20 times, my cluster is already at 120 CPU's and after that the jobs get OOMKilled.
Has anyone gone through this. How to get out of this pickle....
f
freezing-airport-6809
11/18/2022, 4:41 PM
What do you mean not released
l
limited-dog-47035
11/18/2022, 5:03 PM
Do you mean that the pods aren't terminating? Sometimes that happens to us too, and we're not 100% sure why either.
s
stocky-notebook-88311
11/18/2022, 5:45 PM
yes the workflow node pods just say succeeded and the CPU request consumption comes down only when we delete the workflow from k8s