Ferdinand von den Eichen
02/16/2023, 2:46 PM/var/run/credentials
in the sync-cluster-resources
init container.
• syncresources also became unhealthy, because the secret was not being mounted to the service at all (as far as we could tell)
That being said, after solving both steps manually we eventually saw some life signs on our data plane cluster. However the flytepropeller
service on the data plane cluster didn’t seem to be able to reconcile the workflow correctly, instead referencing a non existing flyteadmin service.
E0216 14:41:48.783203 1 workers.go:102] error syncing 'flytesnacks-development/f9f0589cf97c949c892f': Workflow[] failed. ErrorRecordingError: failed to publish event, caused by: EventSinkError: Error sending event, caused by [rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup flyteadmin on 10.100.0.10:53: no such host"]
Is there a full working example? What are we missing? Or would anyone from the community be up for an exchange, if you have succeeded to do the multi cluster setup?
For final background:
• We aim for the multi cluster setup for operational reasons, but mostly for data isolation. That way we can isolate customer data on a cluster and AWS account level.Ketan (kumare3)
02/16/2023, 3:34 PMFerdinand von den Eichen
02/17/2023, 7:19 AMvalues.yaml
I’m using.
flytepropeller:
enabled: true
manager: false
# -- Whether to install the flyteworkflows CRD with helm
createCRDs: true
# -- Replicas count for Flytepropeller deployment
replicaCount: 1
image:
# -- Docker image for Flytepropeller deployment
repository: <http://cr.flyte.org/flyteorg/flytepropeller|cr.flyte.org/flyteorg/flytepropeller> # FLYTEPROPELLER_IMAGE
tag: v1.1.62 # FLYTEPROPELLER_TAG
pullPolicy: IfNotPresent
# -- Default resources requests and limits for Flytepropeller deployment
resources:
limits:
cpu: 200m
ephemeral-storage: 100Mi
memory: 200Mi
requests:
cpu: 10m
ephemeral-storage: 50Mi
memory: 100Mi
cacheSizeMbs: 0
# -- Error reporting
terminationMessagePolicy: FallbackToLogsOnError
# -- Default regex string for searching configuration files
configPath: /etc/flyte/config/*.yaml
# -- Configuration for service accounts for FlytePropeller
serviceAccount:
# -- Should a service account be created for FlytePropeller
create: true
# -- Annotations for ServiceAccount attached to FlytePropeller pods
annotations: {}
# -- ImagePullSecrets to automatically assign to the service account
imagePullSecrets: []
# -- Annotations for Flytepropeller pods
podAnnotations: {}
# -- nodeSelector for Flytepropeller deployment
nodeSelector: {}
# -- tolerations for Flytepropeller deployment
tolerations: []
# -- affinity for Flytepropeller deployment
affinity: {}
# -- Appends extra command line arguments to the main command
extraArgs: {}
# -- Defines the cluster name used in events sent to Admin
clusterName: ""
# -- Sets priorityClassName for propeller pod(s).
priorityClassName: ""
And what about the cluster-secret not being mounted correctly? Did we forget to configure something?David Espejo (he/him)
02/20/2023, 7:47 PMadmin
section in the flytepropeller
configmap
2. Are you able to retrieve the secrets from the data plane clusters?
kubectl get secrets -n flyte | grep flyteadmin-token
Ferdinand von den Eichen
02/21/2023, 2:14 PMconfigmap:
admin:
admin:
endpoint: flyte.<environment>.<http://kineo.ai:443|kineo.ai:443>
and mounted several secrets that were missing in the guide.domain: development
project: flytesnacks
value: team1
Of course all of these runs fail, because the control plane cluster does not have access to the data. How can I make sure that runs don’t get scheduled AT ALL when the target cluster is unavailable? What is the intended behaviour here? We want to scale clusters up and down dynamically…Ketan (kumare3)
02/21/2023, 3:45 PMAlex Papanicolaou
02/23/2023, 1:05 AM{{- with .Values.flyteadmin.additionalVolumeMounts -}}
{{ tpl (toYaml .) $ | nindent 10 }}
{{- end }}
to the volumeMounts
section for both the admin
and clusterresourcesync
deployment specsFerdinand von den Eichen
02/23/2023, 10:12 AMclusterresourcesync
deployment it seems that both the cluster-credentials and the flyte-admin-secrets are bound to the same /var/run/credentials
. Isn’t that an issue?clusterresourcesync
won’t cause problems down the road…Alex Papanicolaou
02/23/2023, 5:08 PMVolumes
section of the clusterresourcesync
deployment spec. Obviously you need the additionalVolumes
to be able to have the `additionalVolumeMounts`…
{{- with .Values.flyteadmin.additionalVolumes -}}
{{ tpl (toYaml .) $ | nindent 8 }}
{{- end }}
Regarding /var/run/credentials
, I do think I ran into that problem too. I used /etc
. Here is an example of the cluster config file I used.
flyteadmin:
additionalVolumes:
- name: cluster-credentials-1
secret:
secretName: cluster-credentials-1
- name: cluster-credentials-2
secret:
secretName: cluster-credentials-2
additionalVolumeMounts:
- name: cluster-credentials-1
mountPath: /etc/credentials_1/
- name: cluster-credentials-2
mountPath: /etc/credentials_2/
configmap:
clusters:
labelClusterMap:
cluster_1:
- id: cluster_1
weight: 1
cluster_2:
- id: cluster_2
weight: 1
clusterConfigs:
- name: "cluster_1"
endpoint: ...
enabled: true
auth:
type: "file_path"
tokenPath: "/etc/credentials_1/token"
certPath: "/etc/credentials_1/cacert"
- name: "cluster_2"
endpoint: ...
enabled: true
auth:
type: "file_path"
tokenPath: "/etc/credentials_2/token"
certPath: "/etc/credentials_2/cacert"
additionalVolumes
and additionalVolumeMounts
to clusterresourcesync
: I’m reusing the same ones from the flyteadmin so the more proper thing would be to have the clusterconfig the same ones for admin
and for clusterresourcesync