Has anyone been able to have multiple resource tolerations w Flyte #flyte-support

Has anyone been able to have multiple resource-tol...

delightful-computer-49028

11/28/2022, 10:29 PM

Has anyone been able to have multiple resource-tolerations when requesting gpus for a job? I have 2 node pools for single gpus and multi-gpus, and I want to be able to only use the different gpu machine types when needed: a job wants 4 gpus on the same instance. But I noticed that when requesting 1 gpu flyte defaults to the last toleration when requesting resources. And furthermore when launching a task I don't see any place to determine the tolerations associated with that task.

Copy code

k8s:
    plugins:
      # -- Configuration section for all K8s specific plugins [Configuration structure](<https://pkg.go.dev/github.com/lyft/flyteplugins/go/tasks/pluginmachinery/flytek8s/config>)
      k8s:
        default-env-vars: []
        #  DEFAULT_ENV_VAR: VALUE
        default-cpus: 100m
        default-memory: 100Mi
        resource-tolerations:
          - <http://nvidia.com/gpu|nvidia.com/gpu>:
            - key: "num-gpus"
              operator: "Equal"
              value: 1
              effect: "NoSchedule"
          - <http://nvidia.com/gpu|nvidia.com/gpu>:
            - key: "num-gpus"
              operator: "Equal"
              value: 4
              effect: "NoSchedule"

thankful-minister-83577

11/29/2022, 12:24 AM

is this a list? I think this is a map.

delightful-computer-49028

11/29/2022, 12:33 AM

I have it as a list with just

Copy code

k8s:
    plugins:
      # -- Configuration section for all K8s specific plugins [Configuration structure](<https://pkg.go.dev/github.com/lyft/flyteplugins/go/tasks/pluginmachinery/flytek8s/config>)
      k8s:
        default-env-vars: []
        #  DEFAULT_ENV_VAR: VALUE
        default-cpus: 100m
        default-memory: 100Mi
        resource-tolerations:
          - <http://nvidia.com/gpu|nvidia.com/gpu>:
            - key: "num-gpus"
              operator: "Equal"
              value: 1
              effect: "NoSchedule"

as I was testing single gpu configuration. It works. I was able to run tasks on gke nodes with taints nvidia.com/gpu=present and num-gpus=1

delightful-computer-49028

11/29/2022, 12:34 AM

If I weren't to use a list, how would I be able to have tuples of key, value taint constraints under resource-tolerations?

thankful-minister-83577

11/29/2022, 12:37 AM

so the goal is to make it so that the code looks at the value of the resource request, not just the fact that it’s there.

delightful-computer-49028

11/29/2022, 12:37 AM

yes that is correct

thankful-minister-83577

11/29/2022, 12:37 AM

hmm

thankful-minister-83577

11/29/2022, 12:37 AM

the code won’t do that today

delightful-computer-49028

11/29/2022, 12:38 AM

If i were able to set the tolerations manually that would also work

delightful-computer-49028

11/29/2022, 12:38 AM

for that task

thankful-minister-83577

11/29/2022, 12:38 AM

i see.

thankful-minister-83577

11/29/2022, 12:39 AM

cc @hallowed-mouse-14616 and @freezing-airport-6809 - this should be a broader discussion i feel.

thankful-minister-83577

11/29/2022, 12:39 AM

what would be your preference tarmily?

delightful-computer-49028

11/29/2022, 12:39 AM

something like task(request=Resources(), tolerations={"nvidia.com/gpu": present, "num_gpus": 1, ...})

delightful-computer-49028

11/29/2022, 12:40 AM

I think it would be good to manually set it through hardcoded value or config. I haven't really given task config variables a shot yet. but I imagine that would best for when I use this for larger use cases

thankful-minister-83577

11/29/2022, 12:41 AM

what do you mean by task config variables? like the autodetection of values?

delightful-computer-49028

11/29/2022, 12:41 AM

That is because even if n gpus are requested there are workloads that require signifciantly more RAM than others per gpu. So there could be high ram per gpu machines in our cases

thankful-minister-83577

11/29/2022, 12:43 AM

i feel like the config can get messy quickly… do admins have to set n options in the settings? or will there be logic to round to the nearest number detected? what about partial gpus in the future? etc.

delightful-computer-49028

11/29/2022, 12:45 AM

my bad I didn't mean config. I recalled somewhere in the docs where args in the task decorator could be filled in later like here, where the image per task is filled in via the .config

delightful-computer-49028

11/29/2022, 12:47 AM

actually ignore my config comment. I see that I can just fill in values like the multi container task that should work if I needed to configure tolerations

hallowed-mouse-14616

11/29/2022, 9:12 AM

@delightful-computer-49028 can you help me understand a little better. Are you saying you could use the pod task to manually inject resource tolerations in the PodSpec? If so, this seems to be another instance where it would be beneficial to expose more k8s-specific configuration to regular python-tasks.

delightful-computer-49028

11/29/2022, 3:57 PM

That could be done, but this method seems to be a little heavy handed. I see resource toleration injection as another form of resource requests. When asking for resources there are only so many nodes that match the cpus, mem, gpu, storage requests. I am suggesting just adding a simple way to add more restrictions to the resource request such as key value pairs for taints to allow for more efficient node resource allocation for when the task demands it. Like I don't want a 1 gpu task to run on a 4 gpu node even though it has enough resources because that would block an entire 4 gpu task from running on a 4 gpu node. So I would like to add a taint to the resource request to prevent that from happening. Something like Resource(cpu, mem, storage, mem, taints: Dict[str, str])

freezing-airport-6809

11/29/2022, 4:07 PM

@delightful-computer-49028 I actually think the way we have done pod tasks is wrong

freezing-airport-6809

11/29/2022, 4:08 PM

Pod spec should just be another config on the task

freezing-airport-6809

11/29/2022, 4:08 PM

Alongwith other taskconfig

freezing-airport-6809

11/29/2022, 4:08 PM

This is because we want to keep base container simple

freezing-airport-6809

11/29/2022, 4:08 PM

We intend to make it faster to run just container tasks

freezing-airport-6809

11/29/2022, 4:08 PM

Pod tasks can hinder that optimization

hallowed-mouse-14616

11/29/2022, 4:59 PM

@delightful-computer-49028 ah ok, I think it get it now. So from your example above (ie.

task(request=Resources(), tolerations={"<http://nvidia.com/gpu|nvidia.com/gpu>": present, "num_gpus": 1, ...})

). You expect that this configuration can be parsed and Flyte will set the correct tolerations on the nodes to satisfy the requirements? IMO this would be very difficult and while it may fit your use-case well, it may not satisfy all taint/toleration differences. I have a few thoughts here: (1) If we are going to call them

tolerations

, which I think is the correct way to do it, then the spec should mimic tolerations (ie. include

operater

value

, etc) so that these can be set without requiring parsing and generating. (2) I perfer to have a simple API that covers 99% of the use-cases and then offering more extensive configuration options to make everything available. The current setup (ie. injecting single toleration from configuration if gpus are requested) covers the former. IMO for the later it makes sense to make toleration configuration directly available to python tasks - this could be done through using a PodTemplate to specify default pod configuration. This would work similar to the default PodTemplate work where it serves as the basis for the Pod. This

pod_template

(or whatever we decide to call it) could be exposed on ContainerTask, PythonTask, etc to allow any Pod configuration. I think this solves a number of our problems. So this use-case could be set like (below) - but you could easily use a function to generate the

pod_template

and just call a func to return it.

Copy code

@task(
    request=Resources(),
    pod_template=V1PodTemplate{
        spec=V1PodSpec{
            tolerations=[
                V1Toleration{
                    key= "num-gpus"
                    operator= "Equal"
                    value= 1
                    effect= "NoSchedule"
                },
            ]
        }
    }
)

hallowed-mouse-14616

11/29/2022, 5:00 PM

VERY open to a larger discussion on this - it's certainly easier to get it right the first time rather than introducing an API and then changing it.

delightful-computer-49028

11/29/2022, 5:21 PM

You are right I think 2 is the better approach. It would allow for not only injecting key value but also the op and effect.

thankful-minister-83577

11/29/2022, 5:58 PM

yeah letting users access the pod template more readily sounds like the best solution. one reason is that I’d hope that this becomes less and less necessary to do as time goes on. what i mean is, what you’re really doing is better scaling/scheduling. 4 gpu nodes getting blocked by 1 gpu tasks is poor scheduling on k8s’s part. but scheduling four 1 gpu tasks on a 4 gpu node (i assume that’s possible?) might be the right thing to do. hopefully the schedulers and scalers in the future can better do this.

hallowed-mouse-14616

11/30/2022, 5:28 PM

@thankful-minister-83577 should we create an issue to track this? Seems like it may be a viable solution to expose a little more configuration - but should probably have some discussion right?

thankful-minister-83577

11/30/2022, 5:29 PM

yeah we should

thankful-minister-83577

11/30/2022, 5:29 PM

would you mind putting that in?

hallowed-mouse-14616

11/30/2022, 7:22 PM

here is the issue - https://github.com/flyteorg/flyte/issues/3123

260 Views

Open in Slack

Previous Next