Are the <docs regarding Multi cluster deployment> ...
# flyte-deployment
f
Are the docs regarding Multi cluster deployment up to date? We completed all steps as instructed on EKS, however • flyteadmin became unhealthy, because the cluster-credentials secret is not being mounted correctly to
/var/run/credentials
in the
sync-cluster-resources
init container. • syncresources also became unhealthy, because the secret was not being mounted to the service at all (as far as we could tell) That being said, after solving both steps manually we eventually saw some life signs on our data plane cluster. However the
flytepropeller
service on the data plane cluster didn’t seem to be able to reconcile the workflow correctly, instead referencing a non existing flyteadmin service.
Copy code
E0216 14:41:48.783203 1 workers.go:102] error syncing 'flytesnacks-development/f9f0589cf97c949c892f': Workflow[] failed. ErrorRecordingError: failed to publish event, caused by: EventSinkError: Error sending event, caused by [rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup flyteadmin on 10.100.0.10:53: no such host"]
Is there a full working example? What are we missing? Or would anyone from the community be up for an exchange, if you have succeeded to do the multi cluster setup? For final background: • We aim for the multi cluster setup for operational reasons, but mostly for data isolation. That way we can isolate customer data on a cluster and AWS account level.
k
You have to change flytepropeller config to connect to the right flyteadmin
Check the admin: section in the config
Propeller has to connect to admin to work
f
That makes sense, do you have some documentation for what is available? I can’t find it in the
values.yaml
I’m using.
Copy code
flytepropeller:
  enabled: true
  manager: false
  # -- Whether to install the flyteworkflows CRD with helm
  createCRDs: true
  # -- Replicas count for Flytepropeller deployment
  replicaCount: 1
  image:
    # -- Docker image for Flytepropeller deployment
    repository: <http://cr.flyte.org/flyteorg/flytepropeller|cr.flyte.org/flyteorg/flytepropeller> # FLYTEPROPELLER_IMAGE
    tag: v1.1.62 # FLYTEPROPELLER_TAG
    pullPolicy: IfNotPresent
  # -- Default resources requests and limits for Flytepropeller deployment
  resources:
    limits:
      cpu: 200m
      ephemeral-storage: 100Mi
      memory: 200Mi
    requests:
      cpu: 10m
      ephemeral-storage: 50Mi
      memory: 100Mi
  cacheSizeMbs: 0
  # -- Error reporting
  terminationMessagePolicy: FallbackToLogsOnError
  # -- Default regex string for searching configuration files
  configPath: /etc/flyte/config/*.yaml

  # -- Configuration for service accounts for FlytePropeller
  serviceAccount:
    # -- Should a service account be created for FlytePropeller
    create: true
    # -- Annotations for ServiceAccount attached to FlytePropeller pods
    annotations: {}
    # -- ImagePullSecrets to automatically assign to the service account
    imagePullSecrets: []
  # -- Annotations for Flytepropeller pods
  podAnnotations: {}
  # -- nodeSelector for Flytepropeller deployment
  nodeSelector: {}
  # -- tolerations for Flytepropeller deployment
  tolerations: []
  # -- affinity for Flytepropeller deployment
  affinity: {}
  # -- Appends extra command line arguments to the main command
  extraArgs: {}
  # -- Defines the cluster name used in events sent to Admin
  clusterName: ""
  # -- Sets priorityClassName for propeller pod(s).
  priorityClassName: ""
And what about the cluster-secret not being mounted correctly? Did we forget to configure something?
Friendly ping @Ketan (kumare3) and maybe @David Espejo (he/him) 🫶 could you point us to documentation to understand 1. how to connect propeller in the data plane cluster to the control plane cluster admin service 2. how to mount the secrets for accessing the data plane cluster from the control plane cluster correctly (the current guide seems to be missing details). Details from this message.
d
Hi @Ferdinand von den Eichen! 1. You should be able to find the
admin
section in the
flytepropeller
configmap 2. Are you able to retrieve the secrets from the data plane clusters?
kubectl get secrets -n flyte | grep flyteadmin-token
f
Ok, I got it to work 💪. Would really be nice, if some things where documented in the official guide. I added the following
Copy code
configmap:
  admin:
    admin:
      endpoint: flyte.<environment>.<http://kineo.ai:443|kineo.ai:443>
and mounted several secrets that were missing in the guide.
I noticed some strange things for the multi cluster setup: • When the data plane cluster (team1) was down, runs would still be scheduled into the control plane cluster, even though the project flytesnacks should technically schedule everything into team1, according to its execution label yaml
Copy code
domain: development
project: flytesnacks
value: team1
Of course all of these runs fail, because the control plane cluster does not have access to the data. How can I make sure that runs don’t get scheduled AT ALL when the target cluster is unavailable? What is the intended behaviour here? We want to scale clusters up and down dynamically…
k
@Ferdinand von den Eichen Flyte does not have dynamic cluster management today. This is a huge body of work and not really required by most teams. Again we never really do this - but union cloud has support for dynamic cluster provisioning and management, also auto healing. Happy to dm
a
@Ferdinand von den Eichen the original message looks like an issue I had. I added
Copy code
{{- with .Values.flyteadmin.additionalVolumeMounts -}}
          {{ tpl (toYaml .) $ | nindent 10 }}
          {{- end }}
to the
volumeMounts
section for both the
admin
and
clusterresourcesync
deployment specs
f
Kickass remark, thank you ❤️! This resolved our first issue with the admin not becoming healthy. However, for the
clusterresourcesync
deployment it seems that both the cluster-credentials and the flyte-admin-secrets are bound to the same
/var/run/credentials
. Isn’t that an issue?
It does work, but I’m wondering if losing access to flyte-admin-secrets in the
clusterresourcesync
won’t cause problems down the road…
a
Ah, right. Okay, so I forgot to mention I also included this in the
Volumes
section of the
clusterresourcesync
deployment spec. Obviously you need the
additionalVolumes
to be able to have the `additionalVolumeMounts`…
Copy code
{{- with .Values.flyteadmin.additionalVolumes -}}
        {{ tpl (toYaml .) $ | nindent 8 }}
        {{- end }}
Regarding
/var/run/credentials
, I do think I ran into that problem too. I used
/etc
. Here is an example of the cluster config file I used.
Copy code
flyteadmin:
  additionalVolumes:
  - name: cluster-credentials-1
    secret:
      secretName: cluster-credentials-1
  - name: cluster-credentials-2
    secret:
      secretName: cluster-credentials-2
  additionalVolumeMounts:
  - name: cluster-credentials-1
    mountPath: /etc/credentials_1/
  - name: cluster-credentials-2
    mountPath: /etc/credentials_2/
configmap:
  clusters:
   labelClusterMap:
     cluster_1:
     - id: cluster_1
       weight: 1
     cluster_2:
     - id: cluster_2
       weight: 1
   clusterConfigs:
   - name: "cluster_1"
     endpoint: ...
     enabled: true
     auth:
        type: "file_path"
        tokenPath: "/etc/credentials_1/token"
        certPath: "/etc/credentials_1/cacert"
   - name: "cluster_2"
     endpoint: ...
     enabled: true
     auth:
        type: "file_path"
        tokenPath: "/etc/credentials_2/token"
        certPath: "/etc/credentials_2/cacert"
Regarding appending
additionalVolumes
and
additionalVolumeMounts
to
clusterresourcesync
: I’m reusing the same ones from the flyteadmin so the more proper thing would be to have the clusterconfig the same ones for
admin
and for
clusterresourcesync
171 Views