quaint-diamond-37493
10/19/2022, 7:21 PMruntimeClassName
to nvidia
so that it can actually use the GPU?
I added requests=Resources(gpu="1")
but can I add the runtimeClassName?freezing-airport-6809
rapid-vegetable-16315
10/20/2022, 6:35 AMpod_resources = Resources(cpu="3", mem="20Gi", gpu="1")
export_terrain_texture_container_task = CustomContainerTask(
requests=pod_resources,
limits=pod_resources,
)
class CustomContainerTask(ContainerTask):
def __init__(
self,
requests: Optional[Resources] = None,
limits: Optional[Resources] = None,
**kwargs: Any,
):
super().__init__(
<<stuff>>
requests=requests,
limits=limits,
**kwargs,
)
def get_container(self, settings: SerializationSettings) -> _task_model.Container:
env = {**settings.env, **self.environment} if self.environment else settings.env
return _get_container_definition(
image=self._image,
command=self._cmd,
args=self._args,
data_loading_config=_task_model.DataLoadingConfig(
input_path=self._input_data_dir,
output_path=self._output_data_dir,
format=self._md_format.value,
enabled=True,
io_strategy=self._io_strategy.value if self._io_strategy else None,
),
environment=env,
cpu_request=self.resources.requests.cpu,
cpu_limit=self.resources.limits.cpu,
memory_request=self.resources.requests.mem,
memory_limit=self.resources.limits.mem,
gpu_request=self.resources.requests.gpu,
gpu_limit=self.resources.limits.gpu,
ephemeral_storage_request=self.resources.requests.ephemeral_storage,
ephemeral_storage_limit=self.resources.limits.ephemeral_storage,
)
rapid-vegetable-16315
10/20/2022, 6:37 AMget_container
, since the original does not include the passthrough of the GPU resources. i.e. these lines:
gpu_request=self.resources.requests.gpu,
gpu_limit=self.resources.limits.gpu,
quaint-diamond-37493
10/20/2022, 7:44 AMquaint-diamond-37493
10/20/2022, 7:47 AM<http://nvidia.com/gpu|nvidia.com/gpu>: "1"
and runtimeClassName: nvidia
are added to the container, and if I don't request a GPU, neither are.freezing-airport-6809
quaint-diamond-37493
10/20/2022, 4:16 PMquaint-diamond-37493
10/20/2022, 4:22 PMnvidia/gpu
or is that "hardcoded" in favor of nvidia gpus?
Regarding tolerations, I already setup the ExtendedResourceToleration
admission controller which will add tolerations automatically.
So in my case I don't have to handle that in flyte separately.
One other idea for the runtimeClassName
could be to setup something similar which will add that automatically in k8s, basically a customized ExtendedResourceToleration admission controller or just an additional one...quaint-diamond-37493
10/20/2022, 5:36 PMquaint-diamond-37493
10/20/2022, 8:23 PMruntimeClassName: nvidia
This works for now and I can run GPU tasks, but also means that flyte tasks will always run with nvidia runtime (and hence only be schedules to nodes which have this runtime and hence a GPU).freezing-airport-6809