This seems a better channel for my problem For the dns domai Flyte #flyte-on-gcp

This seems a better channel for my problem. For t...

billowy-glass-15228

09/26/2024, 11:06 PM

This seems a better channel for my problem. For the dns-domain, I use the comanyname.com . Not sure if it needed, but I can also create DNS server on GCP.

average-finland-92144

09/27/2024, 3:15 PM

Hey Roman

'terraform apply' failed several times asking me to install Cloud Resource Manager API, Cloud SQL Admin API, Service Usage API. Cloud Resource Manager API

Let me check, I've tried this with a new project (but several months ago) and the required services were covered here. I think it doesn't hurt much if we add these other apis there

average-finland-92144

09/27/2024, 3:15 PM

so it fails reaching out the GKE cluster, but it finished creating the cluster?

average-finland-92144

09/27/2024, 3:17 PM

For the dns-domain, I use the comanyname.com . Not sure if it needed,

If you don't have a domain around, you could just use local DNS resolution or nip.io

billowy-glass-15228

09/27/2024, 3:20 PM

so it fails reaching out the GKE cluster, but it finished creating the cluster?

I was running 'terraform apply', waiting for ten minutes after installing the services, which showed it had 9 (out of the initial 61) resources left to install.

billowy-glass-15228

09/27/2024, 3:21 PM

I ran the flyte deploy on a newly created Google Cloud project and a new bucket

billowy-glass-15228

09/27/2024, 3:23 PM

Could my problem be solved?

average-finland-92144

09/27/2024, 3:34 PM

absolutely. can you share the output of

terraform version

? are you connected to any other K8s cluster?

billowy-glass-15228

09/27/2024, 3:49 PM

terraform version Terraform v1.5.7 there are other GKE k8s clusters in our domain. My GCP project is fresh new, no other resources besides Flyte deploy were allocated.

average-finland-92144

09/27/2024, 4:04 PM

I guess other resources were already created? I don't get why the apply operation would start by trying to create the cert-manager CRDs, unless it has already created the GKE cluster and the NGINX controller. Can you share the output of

terraform plan

billowy-glass-15228

09/27/2024, 4:07 PM

I guess other resources were already created? I

I have created a new GCP project and only installed what is required by the flyte deploy. I have run terraform apply several, it failed on missing service APIs which I installed and re-run terraform apply.

billowy-glass-15228

09/27/2024, 4:11 PM

Here is the output - it is very long: # kubectl_manifest.cert-manager-issuer will be created + resource "kubectl_manifest" "cert-manager-issuer" { + api_version = "cert-manager.io/v1" + apply_only = false + field_manager = "kubectl" + force_conflicts = false + force_new = false + id = (known after apply) + kind = "Issuer" + live_manifest_incluster = (sensitive value) + live_uid = (known after apply) + name = "letsencrypt-production" + namespace = "flyte" + server_side_apply = false + uid = (known after apply) + validate_schema = true + wait_for_rollout = true + yaml_body = (sensitive value) + yaml_body_parsed = <<-EOT apiVersion: cert-manager.io/v1 kind: Issuer metadata: name: letsencrypt-production namespace: flyte spec: acme: email: roman@vinci4d.ai privateKeySecretRef: name: letsencrypt-production server: https://acme-v02.api.letsencrypt.org/directory solvers: - http01: ingress: ingressClassName: nginx EOT + yaml_incluster = (sensitive value) } Plan: 9 to add, 0 to change, 0 to destroy. ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Note: You didn't use the -out option to save this plan, so Terraform can't guarantee to take exactly these actions if you run "terraform apply" now.

average-finland-92144

09/27/2024, 4:36 PM

let me try to reproduce this, I don't think 9 resources are the result of the first execution of these modules

average-finland-92144

09/27/2024, 10:25 PM

hey Roman So, I reproduced this behaviour and while there's a workaround, this is something the module should handle better. Specifically, with most of the resources in the

ingress

module (like cert-manager) we're basically trying to create resources or detect resource state on a GKE cluster that doesn't exist yet. So if you could create an Issue to capture this problem it'd be great, I'll take point on separating GKE to a different module and state, so the resources go after the cluster (module dependencies are not enough for this). While it works, it makes apply/destroy operations unstable. I saw this issue when trying to reuse an existing project and resources so I removed state for the resources that were complaining, like

terraform state rm helm_release.flyte-core

for example, and after that plan and apply work

billowy-glass-15228

09/30/2024, 4:29 PM

So if you could create an Issue to capture this problem it'd be grea

Sure, can you please share the most relevant link/reference

billowy-glass-15228

09/30/2024, 4:31 PM

so I removed state for the resources that were complaining, like
terraform state rm helm_release.flyte-core

so, would you recommend me to remove the resources and re-run 'terraform apply'?

average-finland-92144

09/30/2024, 5:41 PM

Sure, can you please share the most relevant link/reference

Yes: https://github.com/unionai-oss/deploy-flyte/issues/new

so, would you recommend me to remove the resources and re-run 'terraform apply'?

yep, including removing state for the cert-manager related resources

billowy-glass-15228

09/30/2024, 7:38 PM

When are you planning to work on this issue? tbh I 'd prefer to deploy Flyte from the updated repo

average-finland-92144

09/30/2024, 7:41 PM

I'll try to prioritize this but not sure how long it may take. That's why in the meantime you could try that workaround, or just setup the GCP resources manually

billowy-glass-15228

10/01/2024, 3:59 PM

Thank you for letting me know about this option. tbh I am really thin right now. We are benchmarking many ml platforms, Since [the easyness of] deployment is our most important criteria, I better reproduce the exact desploy instructions.

7 Views

Open in Slack

Previous Next