Hey quick question how would you add a `nodeSelector` for a Flyte #announcements

Hey quick question, how would you add a `nodeSelec...

jolly-whale-9142

02/11/2022, 3:52 PM

Hey quick question, how would you add a

nodeSelector

for a specific workflow/task? I’m not sure if taints are enough to trigger our cluster-autoscaler. I wanted to edit the pods on the fly but I can’t 😕

jolly-whale-9142

02/11/2022, 4:19 PM

We already defined the taints as said in the documentation https://docs.flyte.org/projects/cookbook/en/latest/auto/deployment/customizing_resources.html#customizing-task-resources

freezing-airport-6809

02/11/2022, 4:24 PM

Ohh you mean tolerations or default node selectors?

freezing-airport-6809

02/11/2022, 4:24 PM

Cc @freezing-boots-56761 do you guys do something similar

jolly-whale-9142

02/11/2022, 4:25 PM

Default node selector, we tried tolerations but we’re not sure that it’s triggering the autoscaler so we wanna try nodeSelector

freezing-airport-6809

02/11/2022, 4:29 PM

So ideally we will Have to add default node selectors For gpu here, https://github.com/flyteorg/flyteplugins/blob/master/go/tasks/pluginmachinery/flytek8s/config/config.go

freezing-airport-6809

02/11/2022, 4:29 PM

Don't love it, but only other Option is to use pod task

jolly-whale-9142

02/11/2022, 4:37 PM

So I can’t do it for now correct?

jolly-whale-9142

02/11/2022, 4:46 PM

Also I have a question about taints. So we have the same config for Propeller as the one defined https://github.com/flyteorg/flytepropeller/blob/master/propeller-config.yaml#L51-L56 We defined our taints in Terraform as being

Copy code

taints = {
        dedicated = {
          key    = "flyte/gpu"
          value  = "dedicated"
          effect = "NO_SCHEDULE"
        }
      }

but I’m not sure if it’s correct or it should be

Copy code

taints = {
        <http://nvidia.com/gpu|nvidia.com/gpu> = {
          key    = "flyte/gpu"
          value  = "dedicated"
          effect = "NO_SCHEDULE"
        }
      }

🤔 Terraform plan has the same output

freezing-airport-6809

02/11/2022, 4:47 PM

Cc @high-park-82026

high-park-82026

02/11/2022, 9:13 PM

The propeller config looks correct.. Which cluster autoscaler do you use?

jolly-whale-9142

02/14/2022, 6:54 AM

We’re using https://github.com/kubernetes/autoscaler/releases v1.22.1

jolly-whale-9142

02/14/2022, 8:31 AM

FYI, using a NodeSelector does trigger a scale-up on a dummy deployment

Copy code

Events:
  Type     Reason            Age                From                Message
  ----     ------            ----               ----                -------
  Warning  FailedScheduling  27s (x2 over 28s)  default-scheduler   0/4 nodes are available: 4 node(s) didn't match Pod's node affinity/selector.
  Normal   TriggeredScaleUp  27s                cluster-autoscaler  pod triggered scale-up: [{eks-nodes-g4dn-xlarge-9cbf749a-540f-373d-21cb-800bbc70bea0 0->1 (max: 1)}]

NodeSelector:

Copy code

nodeSelector:
    <http://eks.amazonaws.com/nodegroup|eks.amazonaws.com/nodegroup>: nodes-name-GPU

freezing-boots-56761

02/14/2022, 5:42 PM

@jolly-whale-9142: is this resolved now?

freezing-boots-56761

02/14/2022, 5:42 PM

tolerations should suffice to scale up a node pool assuming all other requirements match. is it scaling up from 0 by any chance?

jolly-whale-9142

02/15/2022, 7:45 AM

Not yet but we are still investigating, we are checking this issue and trying to fix couple of things https://github.com/terraform-aws-modules/terraform-aws-eks/issues/1558

high-park-82026

02/17/2022, 9:56 PM

@jolly-whale-9142 just got the chance to try out tolerations with gpu nodes and I see them scaling up and down just fine… have you managed to get to the root cause here?

jolly-whale-9142

02/21/2022, 7:29 AM

Hey sorry I forgot to reply — I stopped working on that for now because I had to focus on something else but I should come back to it soon(ish). I realised we also had to install the different Nvidia drivers etc, I naively assumed that we didn’t need to do that 😅

280 Views

Open in Slack

Previous Next