Hey quick question, how would you add a `nodeSelec...
# announcements
Hey quick question, how would you add a
for a specific workflow/task? I’m not sure if taints are enough to trigger our cluster-autoscaler. I wanted to edit the pods on the fly but I can’t 😕
Ohh you mean tolerations or default node selectors?
Cc @jeev do you guys do something similar
Default node selector, we tried tolerations but we’re not sure that it’s triggering the autoscaler so we wanna try nodeSelector
Don't love it, but only other Option is to use pod task
So I can’t do it for now correct?
Also I have a question about taints. So we have the same config for Propeller as the one defined https://github.com/flyteorg/flytepropeller/blob/master/propeller-config.yaml#L51-L56 We defined our taints in Terraform as being
Copy code
taints = {
        dedicated = {
          key    = "flyte/gpu"
          value  = "dedicated"
          effect = "NO_SCHEDULE"
but I’m not sure if it’s correct or it should be
Copy code
taints = {
        <http://nvidia.com/gpu|nvidia.com/gpu> = {
          key    = "flyte/gpu"
          value  = "dedicated"
          effect = "NO_SCHEDULE"
🤔 Terraform plan has the same output
Cc @Haytham Abuelfutuh
The propeller config looks correct.. Which cluster autoscaler do you use?
FYI, using a NodeSelector does trigger a scale-up on a dummy deployment
Copy code
  Type     Reason            Age                From                Message
  ----     ------            ----               ----                -------
  Warning  FailedScheduling  27s (x2 over 28s)  default-scheduler   0/4 nodes are available: 4 node(s) didn't match Pod's node affinity/selector.
  Normal   TriggeredScaleUp  27s                cluster-autoscaler  pod triggered scale-up: [{eks-nodes-g4dn-xlarge-9cbf749a-540f-373d-21cb-800bbc70bea0 0->1 (max: 1)}]
Copy code
    <http://eks.amazonaws.com/nodegroup|eks.amazonaws.com/nodegroup>: nodes-name-GPU
@Stephen: is this resolved now?
tolerations should suffice to scale up a node pool assuming all other requirements match. is it scaling up from 0 by any chance?
Not yet but we are still investigating, we are checking this issue and trying to fix couple of things https://github.com/terraform-aws-modules/terraform-aws-eks/issues/1558
@Stephen just got the chance to try out tolerations with gpu nodes and I see them scaling up and down just fine… have you managed to get to the root cause here?
Hey sorry I forgot to reply — I stopped working on that for now because I had to focus on something else but I should come back to it soon(ish). I realised we also had to install the different Nvidia drivers etc, I naively assumed that we didn’t need to do that 😅