delightful-computer-49028
11/28/2022, 10:29 PMk8s:
plugins:
# -- Configuration section for all K8s specific plugins [Configuration structure](<https://pkg.go.dev/github.com/lyft/flyteplugins/go/tasks/pluginmachinery/flytek8s/config>)
k8s:
default-env-vars: []
# DEFAULT_ENV_VAR: VALUE
default-cpus: 100m
default-memory: 100Mi
resource-tolerations:
- <http://nvidia.com/gpu|nvidia.com/gpu>:
- key: "num-gpus"
operator: "Equal"
value: 1
effect: "NoSchedule"
- <http://nvidia.com/gpu|nvidia.com/gpu>:
- key: "num-gpus"
operator: "Equal"
value: 4
effect: "NoSchedule"
thankful-minister-83577
delightful-computer-49028
11/29/2022, 12:33 AMk8s:
plugins:
# -- Configuration section for all K8s specific plugins [Configuration structure](<https://pkg.go.dev/github.com/lyft/flyteplugins/go/tasks/pluginmachinery/flytek8s/config>)
k8s:
default-env-vars: []
# DEFAULT_ENV_VAR: VALUE
default-cpus: 100m
default-memory: 100Mi
resource-tolerations:
- <http://nvidia.com/gpu|nvidia.com/gpu>:
- key: "num-gpus"
operator: "Equal"
value: 1
effect: "NoSchedule"
as I was testing single gpu configuration. It works. I was able to run tasks on gke nodes with taints nvidia.com/gpu=present and num-gpus=1delightful-computer-49028
11/29/2022, 12:34 AMthankful-minister-83577
delightful-computer-49028
11/29/2022, 12:37 AMthankful-minister-83577
thankful-minister-83577
delightful-computer-49028
11/29/2022, 12:38 AMdelightful-computer-49028
11/29/2022, 12:38 AMthankful-minister-83577
thankful-minister-83577
thankful-minister-83577
delightful-computer-49028
11/29/2022, 12:39 AMdelightful-computer-49028
11/29/2022, 12:40 AMthankful-minister-83577
delightful-computer-49028
11/29/2022, 12:41 AMthankful-minister-83577
delightful-computer-49028
11/29/2022, 12:45 AMdelightful-computer-49028
11/29/2022, 12:47 AMhallowed-mouse-14616
11/29/2022, 9:12 AMdelightful-computer-49028
11/29/2022, 3:57 PMfreezing-airport-6809
freezing-airport-6809
freezing-airport-6809
freezing-airport-6809
freezing-airport-6809
freezing-airport-6809
hallowed-mouse-14616
11/29/2022, 4:59 PMtask(request=Resources(), tolerations={"<http://nvidia.com/gpu|nvidia.com/gpu>": present, "num_gpus": 1, ...})
). You expect that this configuration can be parsed and Flyte will set the correct tolerations on the nodes to satisfy the requirements? IMO this would be very difficult and while it may fit your use-case well, it may not satisfy all taint/toleration differences.
I have a few thoughts here:
(1) If we are going to call them tolerations
, which I think is the correct way to do it, then the spec should mimic tolerations (ie. include operater
, value
, etc) so that these can be set without requiring parsing and generating.
(2) I perfer to have a simple API that covers 99% of the use-cases and then offering more extensive configuration options to make everything available. The current setup (ie. injecting single toleration from configuration if gpus are requested) covers the former. IMO for the later it makes sense to make toleration configuration directly available to python tasks - this could be done through using a PodTemplate to specify default pod configuration. This would work similar to the default PodTemplate work where it serves as the basis for the Pod. This pod_template
(or whatever we decide to call it) could be exposed on ContainerTask, PythonTask, etc to allow any Pod configuration. I think this solves a number of our problems. So this use-case could be set like (below) - but you could easily use a function to generate the pod_template
and just call a func to return it.
@task(
request=Resources(),
pod_template=V1PodTemplate{
spec=V1PodSpec{
tolerations=[
V1Toleration{
key= "num-gpus"
operator= "Equal"
value= 1
effect= "NoSchedule"
},
]
}
}
)
hallowed-mouse-14616
11/29/2022, 5:00 PMdelightful-computer-49028
11/29/2022, 5:21 PMthankful-minister-83577
hallowed-mouse-14616
11/30/2022, 5:28 PMthankful-minister-83577
thankful-minister-83577
hallowed-mouse-14616
11/30/2022, 7:22 PM