Hello Nice to meet you all slightly smiling face I was looki Flyte #flyte-support

Hello! Nice to meet you all :slightly_smiling_face...

dazzling-advantage-95256

08/01/2023, 2:29 PM

Hello! Nice to meet you all 🙂 I was looking through the documentation and was not able to find much information about the following: • Let's assume that I want to create a setup where I have one flyte control plane and multiple flyte data planes, across multiple physical clusters. Does flyte support out of the box quota management for different teams/projects somehow? How would that work across different clusters? Also, is there already an observability stack to check the resource usage? Thanks!

kind-kite-58745

08/01/2023, 2:46 PM

Hello Flyte does let you configure resource usage quotas per project, here’s an example of one way to achieve this using flytectl cli: https://docs.flyte.org/en/latest/deployment/configuration/general.html The installation guide for multi-cluster deployments is here: https://docs.flyte.org/en/latest/deployment/deployment/multicluster.html When you register and execute workflows, you have to specify which “project” and “domain” to use. I haven’t set up multi-cluster deployments myself yet, but I’d expect that the project-wide quota would still apply, regardless of which cluster the executions are hosted on. The multicluster guide explains how it is decided which cluster an execution should be scheduled onto based on labels

👍 1

dazzling-advantage-95256

08/01/2023, 2:54 PM

Thanks! So basically the control plane would load balance across the data planes, ensuring that the quotas are respected (based on projects), right? What about observability of the occupancy of the clusters?

dazzling-advantage-95256

08/01/2023, 2:56 PM

(I also saw that you can pin projects/workflows to a cluster)

freezing-boots-56761

08/01/2023, 2:56 PM

wrt to monitoring, you can use any existing k8-native infra you might have in place to monitor task resource usage (e.g Stackdriver, Cloudwatch, Prometheus, Grafana, Datadog, etc.) and/or logs (e.g Stackdriver, Cloudwatch, Datadog, Loki, etc). that is outside the scope of Flyte. Flyte also exposes prometheus metrics for its own internal metrics.

dazzling-advantage-95256

08/01/2023, 2:57 PM

Thanks @freezing-boots-56761, was just wondering if flyte had already something out of the box to monitor the cluster usage

freezing-boots-56761

08/01/2023, 2:57 PM

Not sure about Flyte-internal cluster occupancy metrics unfortunately, but this can be achieved with kube-state-metrics + Prometheus.

dazzling-advantage-95256

08/01/2023, 2:59 PM

Thanks. I saw that quotas can be specified for cpus and memory. Can that be done also for GPUs (count)?

kind-kite-58745

08/01/2023, 3:08 PM

I checked how this could be achieved for GPUs, and this section seems relevant: https://docs.flyte.org/en/latest/deployment/configuration/general.html#cluster-resources The quotas on projects use the kubernetes

ResourceQuota

resource, which is templated here: https://github.com/flyteorg/flyte/blob/master/charts/flyte-core/values.yaml#L877 According to the cluster-resources section in the configuration guide, you can create custom attributes, which can then be passed to a custom template specified in the values.yaml (assuming helm installation)

freezing-boots-56761

08/01/2023, 3:09 PM

Yes I believe so. Anything that k8s supports in its ResourceQuota object. https://kubernetes.io/docs/concepts/policy/resource-quotas/

kind-kite-58745

08/01/2023, 3:10 PM

So in your case that would be something like

Copy code

- key: ab_project_resource_quota
      value: |
        apiVersion: v1
        kind: ResourceQuota
        metadata:
          name: project-quota
          namespace: {{ namespace }}
        spec:
          hard:
            limits.cpu: {{ projectQuotaCpu }}
            limits.memory: {{ projectQuotaMemory }}
            <http://limits.nvidia.com/gpu|limits.nvidia.com/gpu>: {{ projectQuotaGpu }} # this is the added line, the rest is from the default values

and

Copy code

attributes:
    projectQuotaCpu: "1000"
    projectQuotaMemory: 5Ti
    projectQuotaGpu: "100" # this is the added custom attribute
domain: development
project: flyteexamples

with

Copy code

flytectl update cluster-resource-attribute --attrFile cra.yaml

dazzling-advantage-95256

08/01/2023, 3:54 PM

Amazing, thank you all for your help. Just wanted to double check that this assumption is correct:

Copy code

So basically the control plane would load balance across the data planes, ensuring that the quotas are respected (based on projects), right?

dazzling-advantage-95256

08/02/2023, 1:28 PM

Additional question about this: I saw that flyte supports the yunikorn scheduler; that scheduler has support for gang-scheduling, hierarchical queues, fair share, etc. . If we were to use that, I'm assuming we would just need to configure that directly in each data plane cluster and that operates separately from their existing project quotas?

freezing-boots-56761

08/02/2023, 4:25 PM

i suspect that would still respect project quotas - this is enforced by the k8s control plane. where did you hear about flyte support for yunikorn?

dazzling-advantage-95256

08/02/2023, 4:33 PM

From here

freezing-boots-56761

08/02/2023, 4:36 PM

ah ok. if im reading that correctly, that's for the kubeflow training operator. gang scheduling actually makes sense there.

freezing-boots-56761

08/02/2023, 4:52 PM

i think there is definitely interest in integrating with yunikorn natively from flyte for queues and fair share, but i don't believe this is available out of the box now.

dazzling-advantage-95256

08/02/2023, 5:21 PM

fair, that's ok 🙂. Just wanted to understand what we can expect from the scheduling side of things. Thanks!

17 Views

Open in Slack

Previous Next