Are the <https docs flyte org en latest deployment deploymen Flyte #flyte-deployment

Are the <docs regarding Multi cluster deployment> ...

microscopic-school-54375

02/16/2023, 2:46 PM

Are the docs regarding Multi cluster deployment up to date? We completed all steps as instructed on EKS, however • flyteadmin became unhealthy, because the cluster-credentials secret is not being mounted correctly to

/var/run/credentials

in the

sync-cluster-resources

init container. • syncresources also became unhealthy, because the secret was not being mounted to the service at all (as far as we could tell) That being said, after solving both steps manually we eventually saw some life signs on our data plane cluster. However the

flytepropeller

service on the data plane cluster didn’t seem to be able to reconcile the workflow correctly, instead referencing a non existing flyteadmin service.

Copy code

E0216 14:41:48.783203 1 workers.go:102] error syncing 'flytesnacks-development/f9f0589cf97c949c892f': Workflow[] failed. ErrorRecordingError: failed to publish event, caused by: EventSinkError: Error sending event, caused by [rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup flyteadmin on 10.100.0.10:53: no such host"]

Is there a full working example? What are we missing? Or would anyone from the community be up for an exchange, if you have succeeded to do the multi cluster setup? For final background: • We aim for the multi cluster setup for operational reasons, but mostly for data isolation. That way we can isolate customer data on a cluster and AWS account level.

freezing-airport-6809

02/16/2023, 3:34 PM

You have to change flytepropeller config to connect to the right flyteadmin

freezing-airport-6809

02/16/2023, 3:34 PM

Check the admin: section in the config

freezing-airport-6809

02/16/2023, 3:34 PM

Propeller has to connect to admin to work

microscopic-school-54375

02/17/2023, 7:19 AM

That makes sense, do you have some documentation for what is available? I can’t find it in the

values.yaml

I’m using.

Copy code

flytepropeller:
  enabled: true
  manager: false
  # -- Whether to install the flyteworkflows CRD with helm
  createCRDs: true
  # -- Replicas count for Flytepropeller deployment
  replicaCount: 1
  image:
    # -- Docker image for Flytepropeller deployment
    repository: <http://cr.flyte.org/flyteorg/flytepropeller|cr.flyte.org/flyteorg/flytepropeller> # FLYTEPROPELLER_IMAGE
    tag: v1.1.62 # FLYTEPROPELLER_TAG
    pullPolicy: IfNotPresent
  # -- Default resources requests and limits for Flytepropeller deployment
  resources:
    limits:
      cpu: 200m
      ephemeral-storage: 100Mi
      memory: 200Mi
    requests:
      cpu: 10m
      ephemeral-storage: 50Mi
      memory: 100Mi
  cacheSizeMbs: 0
  # -- Error reporting
  terminationMessagePolicy: FallbackToLogsOnError
  # -- Default regex string for searching configuration files
  configPath: /etc/flyte/config/*.yaml

  # -- Configuration for service accounts for FlytePropeller
  serviceAccount:
    # -- Should a service account be created for FlytePropeller
    create: true
    # -- Annotations for ServiceAccount attached to FlytePropeller pods
    annotations: {}
    # -- ImagePullSecrets to automatically assign to the service account
    imagePullSecrets: []
  # -- Annotations for Flytepropeller pods
  podAnnotations: {}
  # -- nodeSelector for Flytepropeller deployment
  nodeSelector: {}
  # -- tolerations for Flytepropeller deployment
  tolerations: []
  # -- affinity for Flytepropeller deployment
  affinity: {}
  # -- Appends extra command line arguments to the main command
  extraArgs: {}
  # -- Defines the cluster name used in events sent to Admin
  clusterName: ""
  # -- Sets priorityClassName for propeller pod(s).
  priorityClassName: ""

And what about the cluster-secret not being mounted correctly? Did we forget to configure something?

microscopic-school-54375

02/20/2023, 9:49 AM

Friendly ping @freezing-airport-6809 and maybe @average-finland-92144 🫶 could you point us to documentation to understand 1. how to connect propeller in the data plane cluster to the control plane cluster admin service 2. how to mount the secrets for accessing the data plane cluster from the control plane cluster correctly (the current guide seems to be missing details). Details from this message.

average-finland-92144

02/20/2023, 7:47 PM

Hi @microscopic-school-54375! 1. You should be able to find the

admin

section in the

flytepropeller

configmap 2. Are you able to retrieve the secrets from the data plane clusters?

kubectl get secrets -n flyte | grep flyteadmin-token

microscopic-school-54375

02/21/2023, 2:14 PM

Ok, I got it to work 💪. Would really be nice, if some things where documented in the official guide. I added the following

Copy code

configmap:
  admin:
    admin:
      endpoint: flyte.<environment>.<http://kineo.ai:443|kineo.ai:443>

and mounted several secrets that were missing in the guide.

microscopic-school-54375

02/21/2023, 2:18 PM

I noticed some strange things for the multi cluster setup: • When the data plane cluster (team1) was down, runs would still be scheduled into the control plane cluster, even though the project flytesnacks should technically schedule everything into team1, according to its execution label yaml

Copy code

domain: development
project: flytesnacks
value: team1

Of course all of these runs fail, because the control plane cluster does not have access to the data. How can I make sure that runs don’t get scheduled AT ALL when the target cluster is unavailable? What is the intended behaviour here? We want to scale clusters up and down dynamically…

freezing-airport-6809

02/21/2023, 3:45 PM

@microscopic-school-54375 Flyte does not have dynamic cluster management today. This is a huge body of work and not really required by most teams. Again we never really do this - but union cloud has support for dynamic cluster provisioning and management, also auto healing. Happy to dm

hallowed-doctor-67759

02/23/2023, 1:05 AM

@microscopic-school-54375 the original message looks like an issue I had. I added

Copy code

{{- with .Values.flyteadmin.additionalVolumeMounts -}}
          {{ tpl (toYaml .) $ | nindent 10 }}
          {{- end }}

to the

volumeMounts

section for both the

admin

and

clusterresourcesync

deployment specs

microscopic-school-54375

02/23/2023, 10:12 AM

Kickass remark, thank you ❤️! This resolved our first issue with the admin not becoming healthy. However, for the

clusterresourcesync

deployment it seems that both the cluster-credentials and the flyte-admin-secrets are bound to the same

/var/run/credentials

. Isn’t that an issue?

microscopic-school-54375

02/23/2023, 10:14 AM

It does work, but I’m wondering if losing access to flyte-admin-secrets in the

clusterresourcesync

won’t cause problems down the road…

hallowed-doctor-67759

02/23/2023, 5:08 PM

Ah, right. Okay, so I forgot to mention I also included this in the

Volumes

section of the

clusterresourcesync

deployment spec. Obviously you need the

additionalVolumes

to be able to have the `additionalVolumeMounts`…

Copy code

{{- with .Values.flyteadmin.additionalVolumes -}}
        {{ tpl (toYaml .) $ | nindent 8 }}
        {{- end }}

Regarding

/var/run/credentials

, I do think I ran into that problem too. I used

/etc

. Here is an example of the cluster config file I used.

Copy code

flyteadmin:
  additionalVolumes:
  - name: cluster-credentials-1
    secret:
      secretName: cluster-credentials-1
  - name: cluster-credentials-2
    secret:
      secretName: cluster-credentials-2
  additionalVolumeMounts:
  - name: cluster-credentials-1
    mountPath: /etc/credentials_1/
  - name: cluster-credentials-2
    mountPath: /etc/credentials_2/
configmap:
  clusters:
   labelClusterMap:
     cluster_1:
     - id: cluster_1
       weight: 1
     cluster_2:
     - id: cluster_2
       weight: 1
   clusterConfigs:
   - name: "cluster_1"
     endpoint: ...
     enabled: true
     auth:
        type: "file_path"
        tokenPath: "/etc/credentials_1/token"
        certPath: "/etc/credentials_1/cacert"
   - name: "cluster_2"
     endpoint: ...
     enabled: true
     auth:
        type: "file_path"
        tokenPath: "/etc/credentials_2/token"
        certPath: "/etc/credentials_2/cacert"

hallowed-doctor-67759

02/23/2023, 5:10 PM

Regarding appending

additionalVolumes

and

additionalVolumeMounts

clusterresourcesync

: I’m reusing the same ones from the flyteadmin so the more proper thing would be to have the clusterconfig the same ones for

admin

and for

clusterresourcesync

179 Views

Open in Slack

Previous Next