Is there a way to make Flyte use nodeAffinity instead of nod Flyte #flyte-deployment

Is there a way to make Flyte use nodeAffinity inst...

gray-ocean-62145

07/13/2023, 2:10 PM

Is there a way to make Flyte use nodeAffinity instead of nodeSelector for spot nodes (when

interruptible=True

)? And if not, is this something that would be considered? My justification is this - currently with nodeSelector, it can easily make a task unschedulable, for example we do not run GPUs on Spot nodes so a GPU task with

interruptible=True

will never be schedulable. By using nodeAffinity instead of nodeSelector, this behaviour can be configured by the user with either

requiredDuringSchedulingIgnoredDuringExecution

preferredDuringSchedulingIgnoredDuringExecution

elegant-australia-91422

07/13/2023, 4:59 PM

We wound up adding a layer of indirection between our Flyte workloads & scheduling with Kyverno (specifically mutate rules). The idea is you tag your workloads w/ a well-known annotation and then the policy can apply an arbitrary transformation on the pod spec if the annotation matches (we use this right now to have a level of indirection between model pipeline developers and decisions around what types of nodes we schedule on, on-demand vs spot, etc) It's certainly possible to use this to manage

nodeAffinity

as well.

elegant-australia-91422

07/13/2023, 4:59 PM

The Kyverno helm chart is pretty straightforward to deploy as well.

gray-ocean-62145

07/13/2023, 5:01 PM

Ah this is a good idea! We’re already using kyverno actually, but using mutations for this hadn’t crossed my mind!

elegant-australia-91422

07/13/2023, 9:40 PM

Cool! Feel free to hit me up with questions, we're using kyverno for this sort of indirection between workloads and cloud provider/autoscaler implementation details pretty extensively

hallowed-camera-82098

10/02/2023, 9:23 PM

@elegant-australia-91422 Sorry for resurrecting this old thread, what kind of fields were you managed to use kyverno to mutate? We're getting a lot this in flytepropeller: "error syncing Forbidden: pod updates may not change fields other than

spec.containers[*].image

spec.initContainers[*].image

spec.activeDeadlineSeconds

, `spec.tolerations`" .

elegant-australia-91422

10/02/2023, 9:24 PM

So, just last week I had to rip out the kyverno pieces exactly due to this finalizer issue.

hallowed-camera-82098

10/02/2023, 9:24 PM

shucks thanks

elegant-australia-91422

10/02/2023, 9:24 PM

I switched to creating higher-order decorators that then resolved to a specific

PodTemplate

- so we still have the separation of concerns between pipeline authors and lower-level infra details / they're separated by codeowners in our monorepo, but unfortunately we couldn't fully handle this on the cluster side

👍 1

elegant-australia-91422

10/02/2023, 9:25 PM

That said,

PodTemplate

seems much more functional than the previous hand-rolled PodTask we'd written, and you can specify a full podspec properly there (including nodeAffinity terms, podAntiAffinity, etc)

🙏 1

18 Views

Open in Slack

Previous Next