https://flyte.org logo
#ask-the-community
Title
# ask-the-community
j

Joe Hartshorn

12/18/2023, 2:27 PM
Hi Team, we have implemented some resource quota’s on flyte namespaces so that even when we run a lot of parallel workflows/tasks the cluster doesn’t fill up completely with flyte and stop other pods from being able to run. It seems that for each task Flyte will ask Kubernetes for a pod, Kubernetes refuses and so Flyte just asks again without any kind of backoff, and it’s doing that for all pending tasks at once. That loads the Kubernetes control plane a lot (which doesn’t matter in a way, since AWS provide the control plane and charge a fixed cost) but it also loads anything that has a webhook on the Pods API, so we were seeing crashes and OOMs on things like Datadog agent and Gatekeeper. Is there any way to enable some kind of backoff so Flyte doesn’t accidentally bombard our other services?
s

Samhita Alla

12/19/2023, 9:37 AM
@Dan Rammer (hamersaw), mind taking a look at this?