Newcomer question. I was wondering if anyone had t...
# announcements
t
Newcomer question. I was wondering if anyone had tips on how to configure flyte to access GPUs. I see this document: https://docs.flyte.org/projects/cookbook/en/latest/auto/deployment/configure_use_gpus.html#sphx-glr-auto-deployment-configure-use-gpus-py But I am curious how to apply that configuration shown:
Copy code
plugins:
  k8s:
    resource-tolerations:
      - <http://nvidia.com/gpu:|nvidia.com/gpu:>
        - key: "key1"
          operator: "Equal"
          value: "value1"
          effect: "NoSchedule"
I saw this thread which mentioned updating the propellor configmap. Is that the way to go? Or is there a way to update plugins via the flyte config? Thanks!
y
hey @Tom Szumowski how is your flyte cluster deployed?
t
@Yee I used the Opta GCP deployment guide and adjusted a bit after the deployment.
y
i think the answer will depend on how you deploy it. if you’re using the flyte helm chart for instance, you’ll want it so that it ends up here
t
I was able to get it running and assigned to my GLU node pool by manually editing the propeller configmap and doing a rolling restart. Just wondering if there was a best practice to how to edit those other than me manually editing 🙂
The Opta deployment uses the helm chart. So sounds like that is the way to go for me. Thanks!
y
let us know
copied into the discussion as well btw
t
Ah my apologies. I missed those updates. Thanks!
s
you can also assign k8s resource through scripts. this the sample of my codes with kubernetes client. (I have struggle to assign nvidia GPU - A100 & V100, especiall Multi Instance GPU) when you assign gpu in some specific nodes, then you may setup toleration or node selector to get resource from the nodes.
Copy code
container = V1Container(
    name="container",
    resources=V1ResourceRequirements(
    requests={"cpu": "1", "memory":"4Gi", "<http://nvidia.com|nvidia.com>":"1"},
    limits={"cpu": "4", "memory":"16Gi", "<http://nvidia.com|nvidia.com>":"1"},
    )  ,
    image=<your image name>,
    )


pod_spec = V1PodSpec(
        containers=[container],
        image_pull_secrets=[V1LocalObjectReference(<your image_account>)],
        volumes=volumes,
        tolerations=<your node toleration which want to assign resource>,
        node_selector=<you will need some specific node selector when there are the same spec nodes>
    )

@task(
    task_config=Pod(
        pod_spec=pod_spec,
    ),
)
def your_task()->None:
   print("this is test")
👍 1
🙏 1
131 Views