Hello everyone! We've been scaling up our workflow...
# flyte-support
w
Hello everyone! We've been scaling up our workflows recently and running into an ongoing issue with memory usage in the flyte-binary (v1.13.3), so I've been investigating. As the workflows run, the memory of the flyte-binary pod steadily increases, sometimes exceeding its limits and crashing. I think this is expected, and we can try to increase the memory available to mitigate the crashes. However, I noticed that when the workflow finishes, it looks like some memory isn't released, which means no matter how much memory we allocate to the pod, it will eventually crash. If anyone has any workarounds or fixes for this I'd be grateful.
f
Is that 40GB?
That sores not make sense
w
Yes, this was the other thing I was unsure about, the memory usage seems very high, but I don't really have any context for what it should be.
f
something is wrong, either prometheus metrics are bloating memory
can you turn of metrics and see?
w
Does flyte-binary have metrics enabled by default? I haven't enabled them explicitly - the plots above just come from running
top pods
.
f
i think it does have by default
we should probably turn them off