Is there a way to force the flyte console propeller etc basi Flyte #flyte-support

Is there a way to force the flyte console/propelle...

big-notebook-82371

05/17/2024, 4:43 PM

Is there a way to force the flyte console/propeller/etc. (basically all of the stuff that's always running) onto a specific node-pool? I'm on GCP. I have a default node-pool that I thought it would use, but I created a beefier node pool for some larger tasks, and then everything moved to that node-pool, so our costs went up. Also, are there minimum specs recommended for running all of that? We'd love to get the constantly running pods as cheap as we can

ambitious-air-47430

05/17/2024, 5:22 PM

Im also on GCP with helm. Your setup sounds quite familiar. For as cheap as possible i can say ARM nodes are the cheapest ive gotten. This also answers your other questions. With taints/tolerations you can simply force the deployment on one node if there is no other that accepts it otherwise. For example ARM nodes spawn by default with noshedule taint =

<http://kubernetes.io/arch=arm64|kubernetes.io/arch=arm64>

When you add this to your tolerations of the always running deployments they will move there. Also ensure to tag/taint you beefy instance so nothing that's not wanted can move there.

big-notebook-82371

05/17/2024, 5:25 PM

Awesome, that sounds great. I'll give arm a try and see if it could work, and I'll make sure taints/tolerations are there for the different types. As for setting those for the always running deployments, how did you go about that? I'm using this setup: https://github.com/unionai-oss/deploy-flyte so it might be slightly different, but I should be able to change something in the values file

ambitious-air-47430

05/17/2024, 5:30 PM

Thats the values file from the repo you provided with thee tolerations added. (I might fucked up indentation) In summary we have to add them on: • flyteadmin • datacatalog • flytepropeller • flytescheduler • flyteconsole

values-gcp-core.yaml

ambitious-air-47430

05/17/2024, 5:32 PM

But im not sure how you can set toerations for • syncresources • flyte-pod-webhook These also need them but they are not templated via values.

big-notebook-82371

05/17/2024, 5:34 PM

Beautiful, I'll start with those top ones. And have you been running yours on arm, or that's just the cheapest instances you've seen?

ambitious-air-47430

05/17/2024, 5:35 PM

yeah im running on arm. Works like a charm. But as stated above these two dont have values where you can add the toleration. I check if its possible to overwrite in terraform otherwise we might have to open an mr to add these

big-notebook-82371

05/17/2024, 5:36 PM

cool, thanks so much, that's really helpful

big-notebook-82371

05/17/2024, 5:39 PM

oh one more question. which arm machine type are you using, and how many nodes does it spawn for the base stuff?

ambitious-air-47430

05/17/2024, 5:41 PM

Copy code

node_pools = flatten(
    [
      {
        name         = "default-pool"
        machine_type = "t2a-standard-2" # ~60$/M

        initial_node_count = 1
        min_count          = 1
        max_count          = 1

        auto_repair  = true
        auto_upgrade = true

        disk_size_gb = 30
        disk_type    = "pd-standard"
      },
....

this fits flyte, grafana, and a few other services.

gratitude thank you 1

big-notebook-82371

05/17/2024, 5:42 PM

oh ok, awesome, that's good to know

big-notebook-82371

05/17/2024, 10:29 PM

Ok perfect, I got those top ones working. I tried to get the others, but haven't been able to get them working with the values, so it may require an MR. are you referencing docs for those values somewhere, I can't find them specifically right now

big-notebook-82371

05/20/2024, 5:07 PM

@ambitious-air-47430 just curious if you were able to figure out how to do the same with

syncresources

and

flyte-pod-webhook

ambitious-air-47430

05/21/2024, 3:19 PM

Hey, i was not at my pc for the long weekend. I created a MR to fix this https://github.com/flyteorg/flyte/pull/5386 in the time until this is merged you could either fork the chart for yourself or you could manually apply the changes in the cluster. Another option would be to remove the taint on the node and only add taints to your compute nodes, that you then allow via pod_template. I sadly did not find any way to patch the chart in terraform.

big-notebook-82371

05/21/2024, 3:23 PM

Ok, awesome, that looks great. Thanks for working on that!

17 Views

Open in Slack

Previous Next