Hi All working on setting up Flyte on GKE as part of a side Flyte #flyte-deployment

Hi All , working on setting up Flyte on GKE as par...

cuddly-application-93412

07/14/2023, 8:44 PM

Hi All , working on setting up Flyte on GKE as part of a side project. Does anyone have a document on how to deploy the helm chart on GKE cluster using helm_release resource block of terraform? I'm trying to make sure all the changes are tracked via terraform for me to easily pull apart the infra after use.

kind-kite-58745

07/16/2023, 1:01 PM

I’m doing exactly the same thing right now. I’m not yet done with it, but we can rely on the GCP/GKE manual installation page for guidance: https://docs.flyte.org/en/v1.0.0/deployment/gcp/manual.html You’ll need 7 components • IAM permissions • TLS (cert-manager) • ingress-nginx LB • DNS (your provider of choice, GCP cloud dns, cloudflare (has terraform provider), etc.) • cloudsql database (postgresql) • GCS bucket • flyte itself You’ll need 3 helmcharts: flyte-core, ingress-nginx, cert-manager For the IAM permissions: • google_service_account - service account • google_project_iam_custom_role - role • google_project_iam_member - attach role to service account You may want to use for_each with these permissions based on the guide:

Copy code

{
    flyteadmin = [
      "iam.serviceAccounts.signBlob",
      "storage.buckets.get",
      "storage.objects.create",
      "storage.objects.delete",
      "storage.objects.get",
      "storage.objects.getIamPolicy",
      "storage.objects.update",
    ],
    flytepropeller = [
      "storage.buckets.get",
      "storage.objects.create",
      "storage.objects.delete",
      "storage.objects.get",
      "storage.objects.getIamPolicy",
      "storage.objects.update",
    ],
    flytescheduler = [
      "storage.buckets.get",
      "storage.objects.create",
      "storage.objects.delete",
      "storage.objects.get",
      "storage.objects.getIamPolicy",
      "storage.objects.update",
    ],
    datacatalog = [
      "storage.buckets.get",
      "storage.objects.create",
      "storage.objects.delete",
      "storage.objects.get",
      "storage.objects.update",
    ],
    flyteworkflow = [
      "storage.buckets.get",
      "storage.objects.create",
      "storage.objects.delete",
      "storage.objects.get",
      "storage.objects.list",
      "storage.objects.update",
    ],
  }

ingress-nginx helm_release from

<https://kubernetes.github.io/ingress-nginx>

chart name

ingress-nginx

(i use version

4.0.13

) cert-manager helm_release from

<https://charts.jetstack.io>

chart name

cert-manager

version

v1.12.0

Note that cert-manager here is 1.12.0 instead of 0.12.0 that was used in the documentation example, that’s because we need it to be compatible with newer versions of kubernetes flyte-core helmchart from

<https://flyteorg.github.io/flyte>

chart name

flyte-core

with your preferred flyte version, I use 1.7.0. Use the values from https://github.com/flyteorg/flyte/blob/master/charts/flyte-core/values-gcp.yaml I recommend passing helm values using templatefile to allow dynamic configuration based on terraform values, here’s my example:

Copy code

values = templatefile("../infra-root-modules/helm-values/flyte.yaml", {
    project_id     = var.gcp_project
    db_host        = module.flyte-psql-instance[0].private_ip_address
    db_password    = sensitive(var.flyte_cluster_secrets["${var.environment}/flyte_sql_root_pw"]) # I use carplett sops provider for secrets, handle this however you prefer 
    storage_bucket = module.flyte-storage[0].name
    host_name      = "flyte.${var.environment}.${var.domain}"
  })

From my initial observation it seems that Flyte will automatically create the Certificate resource, but it’s a custom resource installed by cert-manager, so make sure you pass the helm value

installCRDs: true

to your cert-manager and have flyte helm_release depends_on the cert-manager helm_release so it will be able to create the Certificate. You’ll also need to set up an Issuer first, but in my case I prefer

ClusterIssuer

because it lets you separate cert-manager and flyte’s namespaces. Use kubectl provider or kubernetes_manifest resource with kubernetes provider to make something like this (in my case I used a templatefile, so there are some placeholder values):

Copy code

apiVersion: <http://cert-manager.io/v1|cert-manager.io/v1>
kind: ClusterIssuer
metadata:
  name: letsencrypt-production
spec:
  acme:
    server: <https://acme-v02.api.letsencrypt.org/directory>
    email: ${email}
    privateKeySecretRef:
      name: letsencrypt-production
    solvers:
    - selector: {}
      http01:
        ingress:
          class: ${ingress_class}

If using GKE Autopilot, you’ll need to set this for cert-manager values file as well (replace placeholder):

Copy code

global:
  leaderElection:
    namespace: ${certmanager_namespace}

This lets cert-manager create leases in a namespace other than kube-system, because GKE Autopilot restricts access to kube-system namespace. You will also probably need to configure at least 500m cpu requests in your flyte helm values.yaml, because it uses pod anti affinity which requires a minimum of 500m CPU requests when using GKE Autopilot Your helm install will probably fail because dns is not set up, it seems necessary for the ingress to work, which Flyte also uses. Now you should set up DNS however you like, I use cloudflare for this. Next there’s the cloudsql database. Create a

google_sql_database_instance

, create a

google_sql_database

named ‘flyteadmin’, and create a

google_sql_user

. Configure the user’s name and the host IP address outputs from terraform with flyte using templatefile on your values.yaml file. It seems we can’t use dns names here (or the connection name), at least not with private IPs, so I am using a static IP address for now. Next there’s the GCS bucket, a simple

google_storage_bucket

I currently have everything set up except for the DNS, connecting it at the moment. I’ll update here if there’s anything else worth mentioning about this process. Hope this helps

✌️ 1

gratitude thank you 1

cuddly-application-93412

07/16/2023, 2:15 PM

Thanks @kind-kite-58745 for the detailed writeup , I'm using flyte-binary chart , moreover the steps will be same and I think you mentioned the following component TLS (cert-manager) to fix certification verification failure while the helm chart is being deployed , makes sense .

cuddly-application-93412

07/16/2023, 2:16 PM

I'm currently burning my head over fixing the tls error , once that is sorted out , this will be an achievement for sure 🙌

cuddly-application-93412

07/16/2023, 2:17 PM

Reading through the writeup as of now to get where I'm going wrong

kind-kite-58745

07/16/2023, 5:29 PM

I got it working like this:

Copy code

resource "cloudflare_record" "flyte" {
  zone_id  = var.cloudflare_zone_id
  name     = local.flyte_host
  value    = data.kubernetes_service.nginx-lb.status[0].load_balancer[0].ingress[0].ip
  type     = "A"
  ttl      = 3600
  priority = 10
  proxied  = false
}

data "kubernetes_service" "nginx-lb" {
  metadata {
    name      = "${module.nginx-ingress[0].release_name}-ingress-nginx-controller"
    namespace = module.nginx-ingress[0].namespace
  }
  depends_on = [module.nginx-ingress]
}

Note that I used modules wrapping the charts instead of the helm_release directly. You don’t need to do this, you can use the helm_release directly, this is something unique to my own use case because of other, unrelated requirements. The idea is that you add a datasource for the ingress-nginx loadbalancer service, configured with the service name (which is going to have the release name as its prefix) and the namespace where you expect it to be. Then you can get its IP address using

data.kubernetes_service.nginx-lb.status[0].load_balancer[0].ingress[0].ip

If using cluster issuer, you have to make sure you configured the annotations of the ingress resources in the flyte helm values accordingly. Example from my flyte-core values.yaml templatefile:

Copy code

common:
  ingress:
    host: "{{ .Values.userSettings.hostName }}"
    tls:
      enabled: true
    annotations:
      <http://kubernetes.io/ingress.class|kubernetes.io/ingress.class>: nginx
      <http://nginx.ingress.kubernetes.io/ssl-redirect|nginx.ingress.kubernetes.io/ssl-redirect>: "true"
      <http://cert-manager.io/cluster-issuer|cert-manager.io/cluster-issuer>: "letsencrypt-production"
      <http://nginx.ingress.kubernetes.io/whitelist-source-range|nginx.ingress.kubernetes.io/whitelist-source-range>: ${whitelisted_cidrs}

Use either

<http://cert-manager.io/cluster-issuer|cert-manager.io/cluster-issuer>

<http://cert-manager.io/issuer|cert-manager.io/issuer>

depending on your choice with the kubernetes_manifest, has to match

cuddly-application-93412

07/17/2023, 2:29 AM

Thanks Victor for this , although this is a bit over my head as I'm not that good at infra and certificates and DNS , trying to wrap my head around this. I think the cert manager one makes sense to be added to manifest as I was reading documentation on how to issue client TLS certificates for k8s cluster

cuddly-application-93412

07/17/2023, 2:33 AM

Since the helm chart for flyte binary which is a single cluster setup is failing while doing a helm install basically via helm_release so this makes sense to solve the K8s cluster unreachable error

cuddly-application-93412

07/17/2023, 2:26 PM

Meanwhile I was able to get helm cli to deploy flyte binary chart without any values

cuddly-application-93412

07/17/2023, 2:26 PM

And its now waiting on DB , since I didnt set any values so checking how to do it via terraform to avoid exposing DB details

kind-kite-58745

07/20/2023, 1:25 PM

In the end I gave up on using GKE autopilot because of this bug in GCP that makes it unsuitable for ML workflows: https://issuetracker.google.com/issues/227162588 More on the GCP IAM permissions:

google_container_cluster

should be configured with a

workload_identity_config

input like this:

Copy code

resource "google_container_cluster" "gke" {
...
  workload_identity_config {
    workload_pool = "${var.gcp_project}.svc.id.goog"
  }
...
}

Then create the service accounts of each flyte component by looping for_each on the local map I shared above:

Copy code

resource "google_service_account" "flyte_sa" {
  for_each = local.service_accounts
  account_id   = each.key
  display_name = each.key
  project      = var.gcp_project
}

Add the custom role for each:

Copy code

resource "google_project_iam_custom_role" "flyte_role" {
  for_each = local.service_accounts
  title       = each.key
  project     = var.gcp_project
  permissions = each.value
  role_id = "${each.key}_${random_string.role_id_suffix.id}", #roles are not deleted immediately behind the scenes so name should be unique, use random_string resource to generate a suffix
}

Bind the gcp roles to the gcp service accounts:

Copy code

resource "google_project_iam_member" "membership" {
  for_each = local.service_accounts
  project = var.gcp_project
  role    = google_project_iam_custom_role.flyte_role["${each.key}"].name
  member = "serviceAccount:${google_service_account.flyte_sa["${each.key}"].email}"
}

Create bindings to allow kubernetes serviceaccounts (kind: ServiceAccount) to use workload identity permissions:

Copy code

resource "google_service_account_iam_member" "flyteworkflow_sa_binding" {
  for_each = toset(["development", "staging", "production"])
  service_account_id = google_service_account.flyte_sa["flyteworkflow"].id
  role               = "roles/iam.workloadIdentityUser"
  member             = "serviceAccount:${var.gcp_project}.svc.id.goog[${each.key}/default]"
}

These all loop on the same map so it’s a good idea to put them in a terraform module and to the for_each once on the module call, I’m not doing it here to keep the example simple With this IAM setup, flyte works for me from terraform

cuddly-application-93412

07/21/2023, 6:42 PM

Makes sense , no issues , I finally found a way to get helm to work via terraform for deploying flyte binary as my data pipelines wont be that huge in number , but this is helpful , thanks Victor

30 Views

Open in Slack

Previous Next