https://flyte.org logo
Title
f

Felix Ruess

04/06/2023, 12:08 PM
Hi all, does anyone know why CPU limits if omitted are set to CPU requests? I need to run my pods with CPU requests but no limit.... https://github.com/flyteorg/flyte/issues/3574
m

Michael Tinsley

04/06/2023, 12:24 PM
The limit in this case will be set by the deployment defaults I think. I remember seeing talk about the deployment defaults being removed but not sure if that has happened yet
d

David Espejo (he/him)

04/06/2023, 12:36 PM
f

Felix Ruess

04/06/2023, 1:10 PM
IMHO this should be changed then.... If I explicitly pass resources, I want exactly those applied (with the ones omitted I didn't specify)
k

Ketan (kumare3)

04/06/2023, 1:31 PM
We see a lot of unpredictability when we keep the limits floating
Cc @Yee and @katrina to verify that we indeed make limit match requests
f

Felix Ruess

04/06/2023, 2:58 PM
I just see a lot of unnecessary throttling when I have CPU limits (but I definitely want/need mem limits!). Especially since my tasks are usually GPU tasks and I only have one running per node/server... See also for background: https://home.robusta.dev/blog/stop-using-cpu-limits
So it would be fine if the default limits are applied if limits are not specified in the task at all, but if I have limits in the task (e.g. only gpu and mem limits) exactly these should be used and not the "missing" cpu limit filled with the default limit
k

katrina

04/06/2023, 4:23 PM
we do indeed inject limits=requests when requests are set but not limits. it might be worthwhile to explore a notion of an explicit or nullable limit value to allow disabling the default limit injection
f

Felix Ruess

04/06/2023, 4:32 PM
IMHO injecting the whole requests as limits would be fine if there no limits defined, but filling specific missing limits is not...
so you still have the convenience of only specifying requests explicitly but also the option to omit some limits
m

Michael Tinsley

04/06/2023, 4:34 PM
I think an explicit
resource=None
e.g.
requests=Resources(cpu="2", mem="1Gi"),
limits=Resources(mem="2Gi", cpu=None),
Would make the most sense in this situation? I think global defaults make sense still
f

Felix Ruess

04/06/2023, 4:36 PM
That would also work for me, but I still can't think of a case where I want only some limits populated by the requests
IMHO my proposal is also easier to reason about
d

David Espejo (he/him)

04/06/2023, 4:41 PM
@Felix Ruess would you like to start a discussion in the RFC Incubator? https://github.com/flyteorg/flyte/discussions/categories/rfc-incubator
f

Felix Ruess

04/06/2023, 4:45 PM
@David Espejo (he/him) if that makes more sense than the issue, I can do that... But I'll be on vacation šŸŖ‚ starting tonight until the 17th, so it will have to wait...
d

David Espejo (he/him)

04/06/2023, 5:05 PM
Well, 1. Thanks for the bug report 2. A proposal that could potentially change (for the better) the user experience, falls into the domain of an RFC 3. Enjoy your vacations!
f

Felix Ruess

04/06/2023, 5:08 PM
ok, will do! Thanks!
k

Ketan (kumare3)

04/17/2023, 2:26 PM
Cc @jeev and @Eduardo Apolinario (eapolinario) we’re just looking at it
j

jeev

04/17/2023, 2:27 PM
this one is a slightly different issue i think
f

Felix Ruess

04/17/2023, 2:28 PM
@jeev what do you mean, how so?
j

jeev

04/17/2023, 2:29 PM
the error we were looking at last night is slightly different i mean. but definitely related. setting limit to request (if former is not specified) is default flyte behavior but intended to achieve ā€œguaranteedā€ QoS on k8s i believe. we were seeing an error where setting a task resource limit above the platform limit was causing the actual value to be set to the platform limit.
i also think we could consider dropping cpu limits more broadly šŸ˜…
f

Felix Ruess

04/17/2023, 2:33 PM
j

jeev

04/17/2023, 2:34 PM
no sorry, i haven’t gotten around to adding the issues yet. will do at some point today. it’s tax day tomorrow šŸ˜…
f

Felix Ruess

04/17/2023, 2:36 PM
From your description it sounds exactly like 3065, but it is supposedly fixed...
Ah, so you mean instead of rejecting it, it would set the previously unspecified limit to the platform limit instead of project/task limit?
j

jeev

04/17/2023, 2:39 PM
i’m struggling to find the right words now. i’ll add some code snippets into the issue that will better illustrate the problem.