https://flyte.org logo
#ask-the-community
Title
# ask-the-community
g

Giacomo Dabisias IT

08/01/2023, 2:29 PM
Hello! Nice to meet you all 🙂 I was looking through the documentation and was not able to find much information about the following: • Let's assume that I want to create a setup where I have one flyte control plane and multiple flyte data planes, across multiple physical clusters. Does flyte support out of the box quota management for different teams/projects somehow? How would that work across different clusters? Also, is there already an observability stack to check the resource usage? Thanks!
v

Victor Churikov

08/01/2023, 2:46 PM
Hello Flyte does let you configure resource usage quotas per project, here’s an example of one way to achieve this using flytectl cli: https://docs.flyte.org/en/latest/deployment/configuration/general.html The installation guide for multi-cluster deployments is here: https://docs.flyte.org/en/latest/deployment/deployment/multicluster.html When you register and execute workflows, you have to specify which “project” and “domain” to use. I haven’t set up multi-cluster deployments myself yet, but I’d expect that the project-wide quota would still apply, regardless of which cluster the executions are hosted on. The multicluster guide explains how it is decided which cluster an execution should be scheduled onto based on labels
g

Giacomo Dabisias IT

08/01/2023, 2:54 PM
Thanks! So basically the control plane would load balance across the data planes, ensuring that the quotas are respected (based on projects), right? What about observability of the occupancy of the clusters?
(I also saw that you can pin projects/workflows to a cluster)
j

jeev

08/01/2023, 2:56 PM
wrt to monitoring, you can use any existing k8-native infra you might have in place to monitor task resource usage (e.g Stackdriver, Cloudwatch, Prometheus, Grafana, Datadog, etc.) and/or logs (e.g Stackdriver, Cloudwatch, Datadog, Loki, etc). that is outside the scope of Flyte. Flyte also exposes prometheus metrics for its own internal metrics.
g

Giacomo Dabisias IT

08/01/2023, 2:57 PM
Thanks @jeev, was just wondering if flyte had already something out of the box to monitor the cluster usage
j

jeev

08/01/2023, 2:57 PM
Not sure about Flyte-internal cluster occupancy metrics unfortunately, but this can be achieved with kube-state-metrics + Prometheus.
g

Giacomo Dabisias IT

08/01/2023, 2:59 PM
Thanks. I saw that quotas can be specified for cpus and memory. Can that be done also for GPUs (count)?
v

Victor Churikov

08/01/2023, 3:08 PM
I checked how this could be achieved for GPUs, and this section seems relevant: https://docs.flyte.org/en/latest/deployment/configuration/general.html#cluster-resources The quotas on projects use the kubernetes
ResourceQuota
resource, which is templated here: https://github.com/flyteorg/flyte/blob/master/charts/flyte-core/values.yaml#L877 According to the cluster-resources section in the configuration guide, you can create custom attributes, which can then be passed to a custom template specified in the values.yaml (assuming helm installation)
j

jeev

08/01/2023, 3:09 PM
Yes I believe so. Anything that k8s supports in its ResourceQuota object. https://kubernetes.io/docs/concepts/policy/resource-quotas/
v

Victor Churikov

08/01/2023, 3:10 PM
So in your case that would be something like
Copy code
- key: ab_project_resource_quota
      value: |
        apiVersion: v1
        kind: ResourceQuota
        metadata:
          name: project-quota
          namespace: {{ namespace }}
        spec:
          hard:
            limits.cpu: {{ projectQuotaCpu }}
            limits.memory: {{ projectQuotaMemory }}
            <http://limits.nvidia.com/gpu|limits.nvidia.com/gpu>: {{ projectQuotaGpu }} # this is the added line, the rest is from the default values
and
Copy code
attributes:
    projectQuotaCpu: "1000"
    projectQuotaMemory: 5Ti
    projectQuotaGpu: "100" # this is the added custom attribute
domain: development
project: flyteexamples
with
Copy code
flytectl update cluster-resource-attribute --attrFile cra.yaml
g

Giacomo Dabisias IT

08/01/2023, 3:54 PM
Amazing, thank you all for your help. Just wanted to double check that this assumption is correct:
Copy code
So basically the control plane would load balance across the data planes, ensuring that the quotas are respected (based on projects), right?
Additional question about this: I saw that flyte supports the yunikorn scheduler; that scheduler has support for gang-scheduling, hierarchical queues, fair share, etc. . If we were to use that, I'm assuming we would just need to configure that directly in each data plane cluster and that operates separately from their existing project quotas?
j

jeev

08/02/2023, 4:25 PM
i suspect that would still respect project quotas - this is enforced by the k8s control plane. where did you hear about flyte support for yunikorn?
g

Giacomo Dabisias IT

08/02/2023, 4:33 PM
From here
j

jeev

08/02/2023, 4:36 PM
ah ok. if im reading that correctly, that's for the kubeflow training operator. gang scheduling actually makes sense there.
i think there is definitely interest in integrating with yunikorn natively from flyte for queues and fair share, but i don't believe this is available out of the box now.
g

Giacomo Dabisias IT

08/02/2023, 5:21 PM
fair, that's ok 🙂. Just wanted to understand what we can expect from the scheduling side of things. Thanks!
4 Views