Hello everyone! We've been scaling up our workflows recently and running into an ongoing issue with memory usage in the flyte-binary (v1.13.3), so I've been investigating.
As the workflows run, the memory of the flyte-binary pod steadily increases, sometimes exceeding its limits and crashing. I think this is expected, and we can try to increase the memory available to mitigate the crashes. However, I noticed that when the workflow finishes, it looks like some memory isn't released, which means no matter how much memory we allocate to the pod, it will eventually crash.
If anyone has any workarounds or fixes for this I'd be grateful.