Has anyone been able to have multiple resource-tol...
# ask-the-community
t
Has anyone been able to have multiple resource-tolerations when requesting gpus for a job? I have 2 node pools for single gpus and multi-gpus, and I want to be able to only use the different gpu machine types when needed: a job wants 4 gpus on the same instance. But I noticed that when requesting 1 gpu flyte defaults to the last toleration when requesting resources. And furthermore when launching a task I don't see any place to determine the tolerations associated with that task.
Copy code
k8s:
    plugins:
      # -- Configuration section for all K8s specific plugins [Configuration structure](<https://pkg.go.dev/github.com/lyft/flyteplugins/go/tasks/pluginmachinery/flytek8s/config>)
      k8s:
        default-env-vars: []
        #  DEFAULT_ENV_VAR: VALUE
        default-cpus: 100m
        default-memory: 100Mi
        resource-tolerations:
          - <http://nvidia.com/gpu|nvidia.com/gpu>:
            - key: "num-gpus"
              operator: "Equal"
              value: 1
              effect: "NoSchedule"
          - <http://nvidia.com/gpu|nvidia.com/gpu>:
            - key: "num-gpus"
              operator: "Equal"
              value: 4
              effect: "NoSchedule"
y
is this a list? I think this is a map.
t
I have it as a list with just
Copy code
k8s:
    plugins:
      # -- Configuration section for all K8s specific plugins [Configuration structure](<https://pkg.go.dev/github.com/lyft/flyteplugins/go/tasks/pluginmachinery/flytek8s/config>)
      k8s:
        default-env-vars: []
        #  DEFAULT_ENV_VAR: VALUE
        default-cpus: 100m
        default-memory: 100Mi
        resource-tolerations:
          - <http://nvidia.com/gpu|nvidia.com/gpu>:
            - key: "num-gpus"
              operator: "Equal"
              value: 1
              effect: "NoSchedule"
as I was testing single gpu configuration. It works. I was able to run tasks on gke nodes with taints nvidia.com/gpu=present and num-gpus=1
If I weren't to use a list, how would I be able to have tuples of key, value taint constraints under resource-tolerations?
y
so the goal is to make it so that the code looks at the value of the resource request, not just the fact that it’s there.
t
yes that is correct
y
hmm
the code won’t do that today
t
If i were able to set the tolerations manually that would also work
for that task
y
i see.
cc @Dan Rammer (hamersaw) and @Ketan (kumare3) - this should be a broader discussion i feel.
what would be your preference tarmily?
t
something like task(request=Resources(), tolerations={"nvidia.com/gpu": present, "num_gpus": 1, ...})
I think it would be good to manually set it through hardcoded value or config. I haven't really given task config variables a shot yet. but I imagine that would best for when I use this for larger use cases
y
what do you mean by task config variables? like the autodetection of values?
t
That is because even if n gpus are requested there are workloads that require signifciantly more RAM than others per gpu. So there could be high ram per gpu machines in our cases
y
i feel like the config can get messy quickly… do admins have to set n options in the settings? or will there be logic to round to the nearest number detected? what about partial gpus in the future? etc.
t
my bad I didn't mean config. I recalled somewhere in the docs where args in the task decorator could be filled in later like here, where the image per task is filled in via the .config
actually ignore my config comment. I see that I can just fill in values like the multi container task that should work if I needed to configure tolerations
d
@Tarmily Wen can you help me understand a little better. Are you saying you could use the pod task to manually inject resource tolerations in the PodSpec? If so, this seems to be another instance where it would be beneficial to expose more k8s-specific configuration to regular python-tasks.
t
That could be done, but this method seems to be a little heavy handed. I see resource toleration injection as another form of resource requests. When asking for resources there are only so many nodes that match the cpus, mem, gpu, storage requests. I am suggesting just adding a simple way to add more restrictions to the resource request such as key value pairs for taints to allow for more efficient node resource allocation for when the task demands it. Like I don't want a 1 gpu task to run on a 4 gpu node even though it has enough resources because that would block an entire 4 gpu task from running on a 4 gpu node. So I would like to add a taint to the resource request to prevent that from happening. Something like Resource(cpu, mem, storage, mem, taints: Dict[str, str])
k
@Tarmily Wen I actually think the way we have done pod tasks is wrong
Pod spec should just be another config on the task
Alongwith other taskconfig
This is because we want to keep base container simple
We intend to make it faster to run just container tasks
Pod tasks can hinder that optimization
d
@Tarmily Wen ah ok, I think it get it now. So from your example above (ie.
task(request=Resources(), tolerations={"<http://nvidia.com/gpu|nvidia.com/gpu>": present, "num_gpus": 1, ...})
). You expect that this configuration can be parsed and Flyte will set the correct tolerations on the nodes to satisfy the requirements? IMO this would be very difficult and while it may fit your use-case well, it may not satisfy all taint/toleration differences. I have a few thoughts here: (1) If we are going to call them
tolerations
, which I think is the correct way to do it, then the spec should mimic tolerations (ie. include
operater
,
value
, etc) so that these can be set without requiring parsing and generating. (2) I perfer to have a simple API that covers 99% of the use-cases and then offering more extensive configuration options to make everything available. The current setup (ie. injecting single toleration from configuration if gpus are requested) covers the former. IMO for the later it makes sense to make toleration configuration directly available to python tasks - this could be done through using a PodTemplate to specify default pod configuration. This would work similar to the default PodTemplate work where it serves as the basis for the Pod. This
pod_template
(or whatever we decide to call it) could be exposed on ContainerTask, PythonTask, etc to allow any Pod configuration. I think this solves a number of our problems. So this use-case could be set like (below) - but you could easily use a function to generate the
pod_template
and just call a func to return it.
Copy code
@task(
    request=Resources(),
    pod_template=V1PodTemplate{
        spec=V1PodSpec{
            tolerations=[
                V1Toleration{
                    key= "num-gpus"
                    operator= "Equal"
                    value= 1
                    effect= "NoSchedule"
                },
            ]
        }
    }
)
VERY open to a larger discussion on this - it's certainly easier to get it right the first time rather than introducing an API and then changing it.
t
You are right I think 2 is the better approach. It would allow for not only injecting key value but also the op and effect.
y
yeah letting users access the pod template more readily sounds like the best solution. one reason is that I’d hope that this becomes less and less necessary to do as time goes on. what i mean is, what you’re really doing is better scaling/scheduling. 4 gpu nodes getting blocked by 1 gpu tasks is poor scheduling on k8s’s part. but scheduling four 1 gpu tasks on a 4 gpu node (i assume that’s possible?) might be the right thing to do. hopefully the schedulers and scalers in the future can better do this.
d
@Yee should we create an issue to track this? Seems like it may be a viable solution to expose a little more configuration - but should probably have some discussion right?
y
yeah we should
would you mind putting that in?
d
205 Views