• t

    Tom Szumowski

    2 weeks ago
    Newcomer question. I was wondering if anyone had tips on how to configure flyte to access GPUs. I see this document:https://docs.flyte.org/projects/cookbook/en/latest/auto/deployment/configure_use_gpus.html#sphx-glr-auto-deployment-configure-use-gpus-py But I am curious how to apply that configuration shown:
    plugins:
      k8s:
        resource-tolerations:
          - <http://nvidia.com/gpu:|nvidia.com/gpu:>
            - key: "key1"
              operator: "Equal"
              value: "value1"
              effect: "NoSchedule"
    I saw this thread which mentioned updating the propellor configmap. Is that the way to go? Or is there a way to update plugins via the flyte config? Thanks!
  • Yee

    Yee

    2 weeks ago
    hey @Tom Szumowski how is your flyte cluster deployed?
  • t

    Tom Szumowski

    2 weeks ago
    @Yee I used the Opta GCP deployment guide and adjusted a bit after the deployment.
  • Yee

    Yee

    2 weeks ago
    i think the answer will depend on how you deploy it. if you’re using the flyte helm chart for instance, you’ll want it so that it ends up here
  • t

    Tom Szumowski

    2 weeks ago
    I was able to get it running and assigned to my GLU node pool by manually editing the propeller configmap and doing a rolling restart. Just wondering if there was a best practice to how to edit those other than me manually editing 🙂
  • The Opta deployment uses the helm chart. So sounds like that is the way to go for me. Thanks!
  • Yee

    Yee

    2 weeks ago
    let us know
  • copied into the discussion as well btw
  • t

    Tom Szumowski

    2 weeks ago
    Ah my apologies. I missed those updates. Thanks!
  • s

    SeungTaeKim

    5 days ago
    you can also assign k8s resource through scripts. this the sample of my codes with kubernetes client. (I have struggle to assign nvidia GPU - A100 & V100, especiall Multi Instance GPU) when you assign gpu in some specific nodes, then you may setup toleration or node selector to get resource from the nodes.
    container = V1Container(
        name="container",
        resources=V1ResourceRequirements(
        requests={"cpu": "1", "memory":"4Gi", "<http://nvidia.com|nvidia.com>":"1"},
        limits={"cpu": "4", "memory":"16Gi", "<http://nvidia.com|nvidia.com>":"1"},
        )  ,
        image=<your image name>,
        )
    
    
    pod_spec = V1PodSpec(
            containers=[container],
            image_pull_secrets=[V1LocalObjectReference(<your image_account>)],
            volumes=volumes,
            tolerations=<your node toleration which want to assign resource>,
            node_selector=<you will need some specific node selector when there are the same spec nodes>
        )
    
    @task(
        task_config=Pod(
            pod_spec=pod_spec,
        ),
    )
    def your_task()->None:
       print("this is test")