Hi ! I am trying to override my preset k8s specs o...
# ask-the-community
a
Hi ! I am trying to override my preset k8s specs of a task inside a dynamic workflow and cannot find a way to make it correctly. I figured that it could come from the container name not matching the one that is being created for the given task. Do you know how to do it ? Here is the code that I used :
Copy code
if training_args.force_a100_gpus:
        return train(input_args).with_overrides(
            pod_template=PodTemplate(
                pod_spec=V1PodSpec(affinity=V1Affinity(
                    node_affinity=V1NodeAffinity(
                        required_during_scheduling_ignored_during_execution=V1NodeSelector(
                            node_selector_terms=[
                                V1NodeSelectorTerm(match_expressions=[
                                    V1NodeSelectorRequirement(
                                        key="<http://cloud.google.com/gke-accelerator|cloud.google.com/gke-accelerator>", operator="In",
                                        values=["nvidia-tesla-a100"])])]
                        ))),
                    restart_policy='never',
                    containers=[V1Container(name='primary',
                                            image='{{.image.imagename.fqn}}:{{.image.imagename.version}}',
                                            resources=V1ResourceRequirements(limits={"<http://nvidia.com/gpu|nvidia.com/gpu>": '1'},
                                                                             requests={"memory": "...",
                                                                                       "cpu": "..."})
                                            )])
            ),
            container_image=None,
            requests=None,
        )
Container name for the train task was "<executionID>-<workflowID>-0-dn2-0"
e
The
container_image
argument doesn't match the
image
for the primary container. I think this might be a problem?
k
How can we improve this experience
f
I think you can just leave out containers in the pod_template
a
containers is a mandatory argument in V1PodSpec
I still did not find a fix 😞
f
but I think you can provide an empty list:
containers=[]
also which version are you running and what is actually launched in the end?
d
@Arthur Lindoulsi what values are you specifically trying to update? Just node selectors and resource requests?
What behavior are you specifically seeing? The
pod_template
argument in the
@task
decorator is applied statically, meaning it is built into the task definition. So I don't suspect it will work with
with_overrides
.
a
Just the gpu type actually so nodeselector requirement values
d
Could you register two separate tasks (one with a100 and one without) and use a conditional to choose which to call at runtime? This would have to added bonus of not incurring the overhead of dynamic tasks (ie. starting a separate Pod to dynamically compile the underlying workflow and running it).
This could be something like:
Copy code
def gpu_training(...):
    # omitted

@task(pod_template=PodTemplate(
        # omitted
    ))
def train_on_a100:
    gpu_training()

@task()
def train_on_normal_gpu:
    gpu_training()

@workflow
def wf(force_a100_gpus: bool):
    conditional("gpu")
        .if_(force_a100_gpus)
        .then(train_on_a100)
        .else_()
        .then(train_on_normal_gpu)
a
I will try it thanks
155 Views