Are the
docs regarding Multi cluster deployment up to date? We completed all steps as instructed on EKS, however
• flyteadmin became unhealthy, because the cluster-credentials secret is not being mounted correctly to
/var/run/credentials
in the
sync-cluster-resources
init container.
• syncresources also became unhealthy, because the secret was not being mounted to the service at all (as far as we could tell)
That being said, after solving both steps manually we eventually saw some life signs on our data plane cluster. However the
flytepropeller
service on the data plane cluster didn’t seem to be able to reconcile the workflow correctly, instead referencing a non existing flyteadmin service.
E0216 14:41:48.783203 1 workers.go:102] error syncing 'flytesnacks-development/f9f0589cf97c949c892f': Workflow[] failed. ErrorRecordingError: failed to publish event, caused by: EventSinkError: Error sending event, caused by [rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup flyteadmin on 10.100.0.10:53: no such host"]
Is there a full working example? What are we missing? Or would anyone from the community be up for an exchange, if you have succeeded to do the multi cluster setup?
For final background:
• We aim for the multi cluster setup for operational reasons, but mostly for data isolation. That way we can isolate customer data on a cluster and AWS account level.