how best can one set the GPU quota for a project domain in f Flyte #flyte-support

Join Slack

how best can one set the GPU quota for a project/d...

# flyte-support

rapid-artist-48509

04/04/2025, 11:38 PM

how best can one set the GPU quota for a project/domain in flyte? 🧵

rapid-artist-48509

04/04/2025, 11:41 PM

first of all, the symbol

projectQuotaGpu

doesn't appear to exist in flyte github repo. second of all, there's a k8s way to set resource quotas for a namespace, but this just makes k8s push back versus the flyte propeller: https://kubernetes.io/docs/concepts/policy/resource-quotas/ (i want the flyte propeller to push back, because i might have multiple projects and because i want flyte to properly show a job as queued vs running (i.e. dequeued to k8s) third, there's this old thread https://discuss.flyte.org/t/13550344/hello-nice-to-meet-you-all-slightly-smiling-face-i-was-looki#a1644d79-90d6-4cbf-ac73-aaf76e3c2b40 But again that seems to use the non-existent symbol ``projectQuotaGpu`

rapid-artist-48509

04/05/2025, 12:01 AM

Hmmm actually, will an execution ever show the queued state? I tried launching several tasks that request 90% of project's

projectQuotaCpu

and flyte WebUI put them all in "running" state, tho I can see the pods are obvi "pending". Does this all only work in the non-single-binary install of Flyte?

rapid-artist-48509

04/05/2025, 12:05 AM

I've only ever seen "queued" reported properly with

map_task

. But here I need the workflow executions to show "queued" properly.

rapid-artist-48509

04/05/2025, 12:29 AM

Also what are the units supposed to be for

projectQuotaCpu

? i would think whole CPUs but when I edit this around it seems it might actually be millis?

rapid-artist-48509

04/07/2025, 3:36 PM

@average-finland-92144 any ideas maybe?

average-finland-92144

04/07/2025, 9:02 PM

hey @rapid-artist-48509 you could use templates for the ClusterResourceManager which accepts spec of K8s objects, like a ResourceQuota https://github.com/flyteorg/flyte/blob/f2a1ad7d49a8b0429d4b51e42c56edd0a8bb5666/charts/flyte/values.yaml#L655-L666 I haven't tried it but AFAICT the keys you use here (like

projectQuotaCpu

) are arbitrary so you could define

projectQuotaGPU

average-finland-92144

04/07/2025, 9:03 PM

also, resources in Flyte follow the K8s units: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#resource-units-in-kubernetes

rapid-artist-48509

04/07/2025, 11:15 PM

Thanks! hmm those are k8s quotas, which I don't need (I don't have e.g. autoscalers). I wonder if Flyte / Flyte Propeller will observe those limits tho and queue accordingly? I tried via the

flytectl update cluster-resource-attribute --attrFile cra.yaml

way and: • I couldn't see how to spec GPUS ... maybe it's just the nvidia gpu tag? • I believe I was using k8s units as expected but yeah things didn't work

rapid-artist-48509

04/08/2025, 5:17 PM

I tried this again with a project

projectQuotaCpu

of 10 and tasks/ workflows that each requested / limit of 9 CPUs. The workflows do NOT queue as expected, they go straight to K8s where they hang around in

NotReady

for a while and then eventually get run. does all this NOT work for flyte-single-binary deployment? also I see no code in the

flyte

repo for project GPU quotas, so is this not a OSS feature?

average-finland-92144

04/08/2025, 10:24 PM

Hey Paul So I don't think propeller enforces resourceQuotas, otherwise it would become a scheduler layer on top the K8s scheduling logic. As mentioned you can request the K8s api server to create a resourceQuota through Flyte but Pod placement decisions in response of resource availability is K8s responsibility. Having said that, you could create a resourceQuota that includes `hard.requests.nvidia.com/gpu: "<int`>"

rapid-artist-48509

04/09/2025, 5:05 PM

Hey @average-finland-92144 thank you so much! I'm learning a lot here!! Yeah it makes sense that the propeller doesn't somehow double the k8s quota logic. That said I think I may have lead this inquiry astray a bit ... My end goal is for users to run lots of workflows

pyflyte run

, so many that all the resources get used and workflows have to queue up. Right now, the Flyte WebUI always shows these queued executions as "Running" even tho they are effectively queued by K8s (and pods are in the "pending" state). This behavior is confusing to users, as they think all their jobs have started and yet clearly they are stuck in line. So my questions I guess are: • What can I do, if anything, to get the workflows to properly display as "queued" in the executions view? I guess I thought that specifying quotas would help, but is there any other way? I think Ketan said that once a workflow / task gets sent to k8s, it's always "Running" according to Flyte. So I was trying to see how to keep workflows from getting dispatched w/out respect to resources. • If there really isn't any facility for the above bullet point, would perhaps my only avenue to be e.g. a feature request where perhaps workflows can report the actual pod state, i.e. "Pending" ? At the end of the day, a use case is like "the user kicked off 100 workflow runs at night, in the morning 90 were done or running and user wants to decide if the other 10 should just be terminated. But it's unclear if the other 10 ever started"

average-finland-92144

04/09/2025, 10:56 PM

what do you see in the

Timeline

tab of the UI for one of those "Running" tasks? When you hover over the execution bar it displays phases. Also, this dashboard includes a metric for Tasks whose Pod is in

Pending

state in K8s: https://grafana.com/grafana/dashboards/22146-flyte-user-dashboard-via-prometheus/

average-finland-92144

04/09/2025, 10:57 PM

For the sake of completeness, I have to say Union has a feature that lets you "fail fast" if a resource request cannot be satisfied by the underlying platform (ref). Not exactly what you want maybe but just noted.

rapid-artist-48509

04/11/2025, 1:50 AM

Let me debug one more time, but I believe the timeline always just says "running" with wall clock time of when the task was sent to k8s. E.g. if I have an image pull that takes 2-3 minutes, that also just shows up in the timeline as if the task itself was running for 2-3 minutes. This is like a hello world task. (Also FWIW

TASK_RUNTIME

in the Timeline view is consistently showing like "-1hr" for a task that took about 1-2 sec to run, and perhaps 1-3 minutes full wallclock time (e.g. slow image pull). I like the idea of Grafana for the cluster, but that was really hoping the Flyte WebUI could just show if some execution is "later in the queue". Thanks for "fail fast" link, but yeah we definitely want to have a long queue of stuff to compute, like a CI system. Thanks again for all the pointers!! I guess none of this would change if we switch from the single-binary / flyte-binary helm chart to the multi-chart deployment?

rapid-artist-48509

04/21/2025, 4:31 PM

Returning to this one, @freezing-airport-6809 maybe: is there / could there be any way for a workflow / execution to show

Pending

if any or all of the containers / pods associated with the execution are effectively queued? Secondly: is there some configuration of Flyte that would exercise the case of several executions getting queued / held up by the propeller for several minutes before sending to k8s, to at least test that behavior? (Thus far I've never seen this happen except in

map_task

internally) ...

rapid-artist-48509

05/06/2025, 11:17 PM

@freezing-airport-6809 if i were to use the flyte API and/or kubernetes python client, could I write a tool to properly get the status of various workflows, to at least report to users? seems like it wouldn't be hard to at least get the pod / container IDs for workflows and then check k8s if they are in the Pending state. it would be ideal if this were just part of flyte of course (as the webapp has k8s permissions) versus this new component will need to be put somewhere where it can safely get k8s api access ...

freezing-airport-6809

05/08/2025, 3:49 PM

you can get it using flyte apis

freezing-airport-6809

05/08/2025, 3:50 PM

@rapid-artist-48509 infact you can use - https://www.union.ai/docs/flyte/deployment/flyte-configuration/cloud_events/

freezing-airport-6809

05/08/2025, 3:51 PM

@rapid-artist-48509 we are working on a major rewrite for map tasks, as the problem today is etcd. It only allows for 1.5MB of storage and what flyte does with it is heroic. We just ran 50k+ tasks on the union platform in a single map job. We are working on taking that to 1M

rapid-artist-48509

05/08/2025, 4:32 PM

Thanks @freezing-airport-6809! yeah etcd is a gift and a curse ... thank you, i'll look at Cloud Events! To be clear: my immediate situation tho is users firing off lots of single-node workflows (rather than a few workflows with many map_tasks). And those workflows never show up as "queued" even tho k8s keeps them stuck in Pending. And I had expected the propeller would not actually dequeue them so fast, i.e. not dequeue more 1-GPU jobs than the cluster has GPUs (and I think I have my cluster resources configured correctly in my flyte helm chart)

freezing-airport-6809

05/08/2025, 5:01 PM

You can do that by limiting the quota

rapid-artist-48509

05/09/2025, 5:17 PM

I have tried that! And tried to document my steps in this thread, tho I apologize if it's not clear. The symbol

projectQuotaGPU

(read: GPU quota) in particular is not in Flyte OSS, so I wonder if it's just Union-only.

rapid-artist-48509

05/09/2025, 5:18 PM

But yeah if there's a recipe for project / domain GPU quota that I'm missing, can you please refer me? FWIW I tried just CPU quotas and they did not seem to work, the propeller would still dequeue jobs that had total CPU requests greater than num CPUs available.

18 Views

Open in Slack

Previous Next