This seems a better channel for my problem. For t...
# flyte-on-gcp
b
This seems a better channel for my problem. For the dns-domain, I use the comanyname.com . Not sure if it needed, but I can also create DNS server on GCP.
a
Hey Roman
'terraform apply' failed several times asking me to install Cloud Resource Manager API, Cloud SQL Admin API, Service Usage API. Cloud Resource Manager API
Let me check, I've tried this with a new project (but several months ago) and the required services were covered here. I think it doesn't hurt much if we add these other apis there
so it fails reaching out the GKE cluster, but it finished creating the cluster?
For the dns-domain, I use the comanyname.com . Not sure if it needed,
If you don't have a domain around, you could just use local DNS resolution or nip.io
b
so it fails reaching out the GKE cluster, but it finished creating the cluster?
I was running 'terraform apply', waiting for ten minutes after installing the services, which showed it had 9 (out of the initial 61) resources left to install.
I ran the flyte deploy on a newly created Google Cloud project and a new bucket
Could my problem be solved?
a
absolutely. can you share the output of
terraform version
? are you connected to any other K8s cluster?
b
terraform version Terraform v1.5.7 there are other GKE k8s clusters in our domain. My GCP project is fresh new, no other resources besides Flyte deploy were allocated.
a
I guess other resources were already created? I don't get why the apply operation would start by trying to create the cert-manager CRDs, unless it has already created the GKE cluster and the NGINX controller. Can you share the output of
terraform plan
?
b
I guess other resources were already created? I
I have created a new GCP project and only installed what is required by the flyte deploy. I have run terraform apply several, it failed on missing service APIs which I installed and re-run terraform apply.
Here is the output - it is very long: # kubectl_manifest.cert-manager-issuer will be created + resource "kubectl_manifest" "cert-manager-issuer" { + api_version = "cert-manager.io/v1" + apply_only = false + field_manager = "kubectl" + force_conflicts = false + force_new = false + id = (known after apply) + kind = "Issuer" + live_manifest_incluster = (sensitive value) + live_uid = (known after apply) + name = "letsencrypt-production" + namespace = "flyte" + server_side_apply = false + uid = (known after apply) + validate_schema = true + wait_for_rollout = true + yaml_body = (sensitive value) + yaml_body_parsed = <<-EOT apiVersion: cert-manager.io/v1 kind: Issuer metadata: name: letsencrypt-production namespace: flyte spec: acme: email: roman@vinci4d.ai privateKeySecretRef: name: letsencrypt-production server: https://acme-v02.api.letsencrypt.org/directory solvers: - http01: ingress: ingressClassName: nginx EOT + yaml_incluster = (sensitive value) } Plan: 9 to add, 0 to change, 0 to destroy. ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Note: You didn't use the -out option to save this plan, so Terraform can't guarantee to take exactly these actions if you run "terraform apply" now.
a
let me try to reproduce this, I don't think 9 resources are the result of the first execution of these modules
hey Roman So, I reproduced this behaviour and while there's a workaround, this is something the module should handle better. Specifically, with most of the resources in the
ingress
module (like cert-manager) we're basically trying to create resources or detect resource state on a GKE cluster that doesn't exist yet. So if you could create an Issue to capture this problem it'd be great, I'll take point on separating GKE to a different module and state, so the resources go after the cluster (module dependencies are not enough for this). While it works, it makes apply/destroy operations unstable. I saw this issue when trying to reuse an existing project and resources so I removed state for the resources that were complaining, like
terraform state rm helm_release.flyte-core
for example, and after that plan and apply work
b
So if you could create an Issue to capture this problem it'd be grea
Sure, can you please share the most relevant link/reference
so I removed state for the resources that were complaining, like
terraform state rm helm_release.flyte-core
so, would you recommend me to remove the resources and re-run 'terraform apply'?
a
Sure, can you please share the most relevant link/reference
Yes: https://github.com/unionai-oss/deploy-flyte/issues/new
so, would you recommend me to remove the resources and re-run 'terraform apply'?
yep, including removing state for the cert-manager related resources
b
When are you planning to work on this issue? tbh I 'd prefer to deploy Flyte from the updated repo
a
I'll try to prioritize this but not sure how long it may take. That's why in the meantime you could try that workaround, or just setup the GCP resources manually
b
Thank you for letting me know about this option. tbh I am really thin right now. We are benchmarking many ml platforms, Since [the easyness of] deployment is our most important criteria, I better reproduce the exact desploy instructions.