I’m trying to run the above gcp terraform. I got t...
# flyte-on-gcp
a
I’m trying to run the above gcp terraform. I got these three errors. I’m not sure why it didn’t find the flyte namespace, I was able to point to it with
kubectl
. And I can’t find those two bucket names to know where to change them. Any ideas?
Copy code
╷
│ Error: namespaces "flyte" not found
│
│   with kubernetes_secret.flyte-tls-secret,
│   on <http://ingress.tf|ingress.tf> line 40, in resource "kubernetes_secret" "flyte-tls-secret":
│   40: resource kubernetes_secret "flyte-tls-secret" {
│
╵
╷
│ Error: googleapi: Error 409: The requested bucket name is not available. The bucket namespace is shared by all users of the system. Please select a different name and try again., conflict
│
│   with module.flyte_data.google_storage_bucket.buckets["flyte-gcp-data"],
│   on .terraform/modules/flyte_data/main.tf line 40, in resource "google_storage_bucket" "buckets":
│   40: resource "google_storage_bucket" "buckets" {
│
╵
╷
│ Error: googleapi: Error 409: The requested bucket name is not available. The bucket namespace is shared by all users of the system. Please select a different name and try again., conflict
│
│   with module.flyte_user_data.google_storage_bucket.buckets["flyte-gcp-user-data"],
│   on .terraform/modules/flyte_user_data/main.tf line 40, in resource "google_storage_bucket" "buckets":
│   40: resource "google_storage_bucket" "buckets" {
d
ok, good findings. Thanks! 1. For the secret issue, I'll try with a explicit dependency on the ns resource 2. Right. GCS bucket names have to be globally unique, so I'll add a randomized suffix to avoid this situation PR to come
a
Awesome, thanks!
d
please fetch. I just pushed a commit. Basically using the PROJECT_NUMBER to build a globally-unique ID for GCS buckets. Updated the instructions too
a
Trying again. What’s the best way to ignore already existing resources? For example service accounts it created on the first apply
d
Terraform should ignore them by default
a
Oh, for some reason it errored out on service accounts
One example:
Copy code
│ Error: Error creating service account: googleapi: Error 409: Service account flyte-gcp-flyteadmin already exists within project projects/dai-ml-pipelines.
│ Details:
│ [
│   {
│     "@type": "type.googleapis.com/google.rpc.ResourceInfo",
│     "resourceName": "projects/<project>/serviceAccounts/flyte-gcp-flyteadmin@<project>.iam.gserviceaccount.com"
│   }
│ ]
│ , alreadyExists
│
│   with google_service_account.flyteadmin-gsa,
│   on iam.tf line 15, in resource "google_service_account" "flyteadmin-gsa":
│   15: resource "google_service_account" "flyteadmin-gsa" {
d
wow that's weird
let's terraform destroy and try again?
a
sounds good
destroy didn’t entirely work, and now I think I’m in a weird state. The main issue seems to be the VPC network
d
ye, the VPC network most likely has dependencies. you can delete it from the UI, then destroy again.
a
It says
The auto-generated peering route cannot be deleted.
when I try to delete, missed that part
But I don’t see any peering routes
d
yep,
go into the VPC network : VPC peering and there's one
a
I think I’d already done that. It seems like that line is just an artifact left over, I’m trying an apply again
I think its up and running, now! I’ll have to come back to it next week to test it out and try some workflows. Thanks for all the help
d
great to hear. any problem that arises, just let me know, I hope to keep improving these modules
a
one other quick question. since I didn’t use helm directly, where it talks about authentication, and changing the helm values file, how would that work? I know there’s the
values-gcp-core.yaml
but do I helm install with that?
d
you should use that values file yes, but apply it with
terraform apply
a
Hey, another follow up to this. The console was working fine for a bit, and then it stopped loading and when I checked the logs for the admin pods (which show
CrashLoopBackOff
), I got this:
Copy code
Defaulted container "flyteadmin" out of: flyteadmin, run-migrations (init), seed-projects (init), sync-cluster-resources (init), generate-secrets (init)
time="2023-11-28T00:11:12Z" level=info msg="Using config file: [/etc/flyte/config/cluster_resources.yaml /etc/flyte/config/clusters.yaml /etc/flyte/config/db.yaml /etc/flyte/config/domain.yaml /etc/flyte/config/remoteData.yaml /etc/flyte/config/server.yaml /etc/flyte/config/storage.yaml /etc/flyte/config/task_resource_defaults.yaml]"
Error: [CERTIFICATE_FAILURE] failed to load X509 key pair: , caused by: open : no such file or directory
[CERTIFICATE_FAILURE] failed to load X509 key pair: , caused by: open : no such file or directory
Usage:
  flyteadmin serve [flags]

Flags:
  -h, --help   help for serve

Global Flags:
      --admin.audience string                                                      Audience to use when initiating OAuth2 
... < a lot of doc output here>

settings for file-filtered logging

panic: [CERTIFICATE_FAILURE] failed to load X509 key pair: , caused by: open : no such file or directory

goroutine 1 [running]:
main.main()
        /go/src/github.com/flyteorg/flyteadmin/cmd/main.go:14 +0x9f
d
sorry, can you revert
secure
to
false
? It should be overrriden by the
insecure: false
flag in your local config file
a
perfect, I think that was it, its up again right now. thanks!
d
great. I'm curious, does the console show you a valid certificate? It should do so, but just checking 🙂
a
Looks like it is, its showing as secured 🙂