Hi all, does anyone know why CPU limits if omitted...
# ask-the-community
f
Hi all, does anyone know why CPU limits if omitted are set to CPU requests? I need to run my pods with CPU requests but no limit.... https://github.com/flyteorg/flyte/issues/3574
m
The limit in this case will be set by the deployment defaults I think. I remember seeing talk about the deployment defaults being removed but not sure if that has happened yet
d
f
IMHO this should be changed then.... If I explicitly pass resources, I want exactly those applied (with the ones omitted I didn't specify)
k
We see a lot of unpredictability when we keep the limits floating
Cc @Yee and @katrina to verify that we indeed make limit match requests
f
I just see a lot of unnecessary throttling when I have CPU limits (but I definitely want/need mem limits!). Especially since my tasks are usually GPU tasks and I only have one running per node/server... See also for background: https://home.robusta.dev/blog/stop-using-cpu-limits
So it would be fine if the default limits are applied if limits are not specified in the task at all, but if I have limits in the task (e.g. only gpu and mem limits) exactly these should be used and not the "missing" cpu limit filled with the default limit
k
we do indeed inject limits=requests when requests are set but not limits. it might be worthwhile to explore a notion of an explicit or nullable limit value to allow disabling the default limit injection
f
IMHO injecting the whole requests as limits would be fine if there no limits defined, but filling specific missing limits is not...
so you still have the convenience of only specifying requests explicitly but also the option to omit some limits
m
I think an explicit
resource=None
e.g.
Copy code
requests=Resources(cpu="2", mem="1Gi"),
limits=Resources(mem="2Gi", cpu=None),
Would make the most sense in this situation? I think global defaults make sense still
f
That would also work for me, but I still can't think of a case where I want only some limits populated by the requests
IMHO my proposal is also easier to reason about
d
@Felix Ruess would you like to start a discussion in the RFC Incubator? https://github.com/flyteorg/flyte/discussions/categories/rfc-incubator
f
@David Espejo (he/him) if that makes more sense than the issue, I can do that... But I'll be on vacation šŸŖ‚ starting tonight until the 17th, so it will have to wait...
d
Well, 1. Thanks for the bug report 2. A proposal that could potentially change (for the better) the user experience, falls into the domain of an RFC 3. Enjoy your vacations!
f
ok, will do! Thanks!
k
Cc @jeev and @Eduardo Apolinario (eapolinario) weā€™re just looking at it
j
this one is a slightly different issue i think
f
@jeev what do you mean, how so?
j
the error we were looking at last night is slightly different i mean. but definitely related. setting limit to request (if former is not specified) is default flyte behavior but intended to achieve ā€œguaranteedā€ QoS on k8s i believe. we were seeing an error where setting a task resource limit above the platform limit was causing the actual value to be set to the platform limit.
i also think we could consider dropping cpu limits more broadly šŸ˜…
f
j
no sorry, i havenā€™t gotten around to adding the issues yet. will do at some point today. itā€™s tax day tomorrow šŸ˜…
f
From your description it sounds exactly like 3065, but it is supposedly fixed...
Ah, so you mean instead of rejecting it, it would set the previously unspecified limit to the platform limit instead of project/task limit?
j
iā€™m struggling to find the right words now. iā€™ll add some code snippets into the issue that will better illustrate the problem.
150 Views