Hi I am trying to override my preset k8s specs of a task ins Flyte #flyte-support

Hi ! I am trying to override my preset k8s specs o...

strong-painting-53828

03/30/2023, 8:39 AM

Hi ! I am trying to override my preset k8s specs of a task inside a dynamic workflow and cannot find a way to make it correctly. I figured that it could come from the container name not matching the one that is being created for the given task. Do you know how to do it ? Here is the code that I used :

Copy code

if training_args.force_a100_gpus:
        return train(input_args).with_overrides(
            pod_template=PodTemplate(
                pod_spec=V1PodSpec(affinity=V1Affinity(
                    node_affinity=V1NodeAffinity(
                        required_during_scheduling_ignored_during_execution=V1NodeSelector(
                            node_selector_terms=[
                                V1NodeSelectorTerm(match_expressions=[
                                    V1NodeSelectorRequirement(
                                        key="<http://cloud.google.com/gke-accelerator|cloud.google.com/gke-accelerator>", operator="In",
                                        values=["nvidia-tesla-a100"])])]
                        ))),
                    restart_policy='never',
                    containers=[V1Container(name='primary',
                                            image='{{.image.imagename.fqn}}:{{.image.imagename.version}}',
                                            resources=V1ResourceRequirements(limits={"<http://nvidia.com/gpu|nvidia.com/gpu>": '1'},
                                                                             requests={"memory": "...",
                                                                                       "cpu": "..."})
                                            )])
            ),
            container_image=None,
            requests=None,
        )

Container name for the train task was "<executionID>-<workflowID>-0-dn2-0"

melodic-magician-71351

03/30/2023, 10:13 AM

The

container_image

argument doesn't match the

image

for the primary container. I think this might be a problem?

freezing-airport-6809

03/30/2023, 1:14 PM

How can we improve this experience

quaint-diamond-37493

03/30/2023, 1:25 PM

I think you can just leave out containers in the pod_template

strong-painting-53828

03/30/2023, 1:52 PM

containers is a mandatory argument in V1PodSpec

strong-painting-53828

03/30/2023, 1:53 PM

I still did not find a fix 😞

quaint-diamond-37493

03/30/2023, 2:21 PM

but I think you can provide an empty list:

containers=[]

quaint-diamond-37493

03/30/2023, 2:22 PM

also which version are you running and what is actually launched in the end?

hallowed-mouse-14616

03/30/2023, 2:50 PM

@strong-painting-53828 what values are you specifically trying to update? Just node selectors and resource requests?

hallowed-mouse-14616

03/30/2023, 2:51 PM

What behavior are you specifically seeing? The

pod_template

argument in the

@task

decorator is applied statically, meaning it is built into the task definition. So I don't suspect it will work with

with_overrides

strong-painting-53828

03/30/2023, 2:51 PM

Just the gpu type actually so nodeselector requirement values

hallowed-mouse-14616

03/30/2023, 2:53 PM

Could you register two separate tasks (one with a100 and one without) and use a conditional to choose which to call at runtime? This would have to added bonus of not incurring the overhead of dynamic tasks (ie. starting a separate Pod to dynamically compile the underlying workflow and running it).

hallowed-mouse-14616

03/30/2023, 2:55 PM

This could be something like:

Copy code

def gpu_training(...):
    # omitted

@task(pod_template=PodTemplate(
        # omitted
    ))
def train_on_a100:
    gpu_training()

@task()
def train_on_normal_gpu:
    gpu_training()

@workflow
def wf(force_a100_gpus: bool):
    conditional("gpu")
        .if_(force_a100_gpus)
        .then(train_on_a100)
        .else_()
        .then(train_on_normal_gpu)

strong-painting-53828

03/30/2023, 3:09 PM

I will try it thanks

163 Views

Open in Slack

Previous Next