Hi, I'd like to specify the amount and quantity of...
# ml-and-mlops-questions
s
Hi, I'd like to specify the amount and quantity of GPUs when running tasks in the Google Kubernetes Engine. Currently, I do that with a task specification like this:
Copy code
@task(
    requests=Resources(cpu="8", mem="54Gi", gpu="2"),
    limits=Resources(cpu="100", mem="1Ti"),
    pod_template=PodTemplate(
        pod_spec=V1PodSpec(
            containers=[
                V1Container(
                    name="primary",
                ),
            ],
            node_selector={
                "cloud.google.com/gke-accelerator": "nvidia-l4",
                "cloud.google.com/gke-accelerator-count": "2",
            },
        )
    ),
)
I see that Flyte also has a features for selecting GPUs: https://docs.flyte.org/en/latest/api/flytekit/extras.accelerators.html However, if I remove the pod_template and just add the accelerator kwarg, then the flytepropellor gives the following error:
Copy code
│ E0113 12:02:55.686281       1 workers.go:103] error syncing '-': failed at Node[-]. Runt │
│ imeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [container]: [GKE Warden constraints violat │
│ ons[] failed to create resource, caused by: admission webhook "warden-validating.common-webhooks.networking.gke.io" denied the request: GKE W │
│ arden rejected the request because it violates one or more constraints.                                                                       │
│ Violations details: {"[denied by autogke-gpu-limitation]":["When requesting 'nvidia.com/gpu' resources, you must specify either node selector │
│  'cloud.google.com/gke-accelerator' with accelerator type or node selector 'cloud.google.com/compute-class' with existing custom compute clas │
│ s which has at least one GPU priority rule."]}
This suggests that the right GKE config is not properly set by providing the accelerator kwarg. Is this supposed to happen? If not, what is the point of the accelerator kwarg?
g
Hey Pim, lemme share how I use it
Here's a task that uses a L4 GPU on GKE and requests some other resources.
@task(container_image=image_spec, requests=Resources(cpu="1", mem="2G"), accelerator=GPUAccelerator("nvidia-l4"), limits=Resources(cpu="4", mem="7G", gpu="1"), timeout=timedelta(minutes=10))
Also important to know is that your flyte install should have these in the config: https://github.com/flyteorg/flyte/blob/b8fb68df84675f25befea766a19f392fb06ae7e6/charts/flyte-binary/gke-starter.yaml#L79-L90
Also as you can see the
gpu
request should be set in the
limits
instead of the
requests
https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#using-device-plugins
s
ah, thanks, that's helpful!
g
Works at least with
flytekit
version
1.13.7
and
flyte-binary
chart version
1.13.2
f
Good thing to add to docs
s
Does this then automatically set "cloud.google.com/gke-accelerator-count" ? Shouldn't we also specify that label to Flyte as well?
Or is this option not necessary?
g
No it won't. But I'm not sure if it's necessary unless you use the count to select certain node pools
s
Ah okay ty
g
Otherwise k8s using the
gpu
limit will be smart enough to find you a GPU that fulfills your resource request
So if you have only 1 node pool that has machines with 8 CPU, 54GB memory and 2 L4s, it's almost guaranteed to be scheduled there even without the count label