:face_with_head_bandage: Issue: Use AWS Secrets Ma...
# ask-the-community
m
🤕 Issue: Use AWS Secrets Manager secrets in Flyte-core chart 🤕 Hi, our team is trying to use the AWS Secrets and Configuration Provider (ASCP) for the Kubernetes Secrets Store CSI Driver for our secrets in the
flyte-core
chart. We have defined our chart values as follow:
Copy code
databaseSecret:
    name: db-pass
    volume:
      - name: db-pass
        csi:
          driver: secrets-store.csi.k8s.io
          readOnly: true
          volumeAttributes:
            secretProviderClass: aws-db-secrets
    secretManifest:
      apiVersion: secrets-store.csi.x-k8s.io/v1
      kind: SecretProviderClass
      metadata:
        name: aws-db-secrets
      spec:
        provider: aws
        parameters:
          objects: |
            - objectName: "<our secret arn>
              jmesPath: 
                  - path: password
                    objectAlias: dbpassword
But the
_helpers.tpl
overrides the helm template with a volumes value that is not the required for AWS to work. How we can avoid that? Do we need to change the
_helpers.tpl
to add this option? Thank you 🫶
s
I believe so. Is it causing any failures in your deployment?
m
yes it does not find my secret
Copy code
Warning  FailedMount  26m (x371 over 3d)     kubelet  Unable to attach or mount volumes: unmounted volumes=[db-pass], unattached volumes=[kube-api-access-jgj6s aws-iam-token db-pass config-volume]: timed out waiting for the condition
the volume pod should be like this to work wiht aws secretes manager
Copy code
volumes:
      - name: db-pass
        csi:
          driver: secrets-store.csi.k8s.io
          readOnly: true
          volumeAttributes:
            secretProviderClass: aws-db-secrets
but it appears automatically like this for the _helpers.tpl
Copy code
volumes:
      - name: db-pass
        secret:
          secretName: db-pass
how we can override this default secret configuration?
s
@David Espejo (he/him), any idea how to fix this?
m
Hi I solved my problem with this configuration:
Copy code
databaseSecret:
    name: db-pass
    secretManifest:
      apiVersion: <http://secrets-store.csi.x-k8s.io/v1|secrets-store.csi.x-k8s.io/v1>
      kind: SecretProviderClass
      metadata:
        name: aws-db-secrets-spc
      spec:
        provider: aws
        parameters:
          objects: |
            - objectName: "{{ .Values.userSettings.db_secret }}"
              objectType: "secretsmanager"
              jmesPath: 
                  - path: password
                    objectAlias: dbpassword
        # Create k8s secret. It requires volume mount first in the pod and then sync.
        secretObjects:
          - secretName: db-pass
            type: Opaque
            data:
              - objectName: dbpassword
                key: dbpassword
With this the aws csi driver creats a base k8s secret and works 🤟
d
Thanks for sharing @Marti Jorda Roca I was just starting to play with this 🙂 So, the
dbpassword
secret ends up mounted as the
db-pass
volume on
flyteadmin
right?
m
yes this is it
sry and also it is required to create another volume, more info
Copy code
datacatalog:

  # mount another db secret to activate secret-store-csi to create db-pass k8s secret.
  additionalVolumes:
    - name: aws-secret
      csi:
        driver: secrets-store.csi.k8s.io
        readOnly: true
        volumeAttributes:
          secretProviderClass: aws-db-secrets-spc

  additionalVolumeMounts:
    - name: aws-secret
      mountPath: "/mnt/aws-secrets"
      readOnly: true
its a little bit hacky
the best would be to allow to override
databaseSecret.volume
d
@Marti Jorda Roca agree. Would you be up to create an Issue to capture that request?
m
hi, yes I have open this issue. If I need to change anything just ask for it ( it is my first issue 😅)
d
Thank you @Marti Jorda Roca What are your thoughts on using this capability?: https://github.com/flyteorg/flyte/pull/3807 I know it's not the same but in the meantime, is an option to avoid plain text DB password on values nvm, you're using
flyte-core
f
Hi @Marti Jorda Roca, Thank you for sharing. I am in need to do the same thing for my Flyte running in AWS EKS. May I ask questions? What is the value in Values.userSettings.db_secret ? Is it the secret name defined in AWS Secrets Manager? Thanks a lot!
d
@Frank Shen Marti shared a workaround on this Issue in the meantime
f
Hi @Kevin Su, @Marti Jorda Roca, @David Espejo (he/him), What will happen when the database password got changed (e.g. in case of rotation)? Will Marti’s implementation automatically re-sync the EKS secret?
m
Hi Frank, we have implemented rotation for our AWS Secret and it still works correctly
f
Hi @Marti Jorda Roca, is it really automatic? Do you have to re-start Flyte services?
m
So you need to deploy
secret-store-csi-driver
and `secrets-provider-aws`:
Copy code
# deploy secrets-store-csi-driver
- cd infrastructure/cluster/k8s_templates/secrets-store-csi-driver
- |
  helm upgrade --install csi-secrets-store \
  --namespace kube-system secrets-store-csi-driver/secrets-store-csi-driver \
  -f values.yaml --version $SECRET_STORE_VERSION
- cd ../../../..
          
# deploy secrets-provider-aws
- |
  helm upgrade --install secrets-provider-aws \
  --namespace kube-system aws-secrets-manager/secrets-store-csi-driver-provider-aws \
  --version $SECRET_PROVIDER_VERSION
Then configure the values of the
secrets-store-csi-driver
as follwoing:
Copy code
syncSecret:
  enabled: true

enableSecretRotation: true
The
secrets-store-csi-driver
will update the secret mounted in flyte automatically, you don’t need to restart flyte. Then configure Flyte-core chart values as the workaround on this Issue in the meantime.
f
Hi @Marti Jorda Roca, I am implementing your suggested solution and ran into issues. Is it possible for us to have a live session (e.g. slack huddle) next Tuesday? Thank you!
Hi @Kevin Su, @David Espejo (he/him), I am stuck. Are you able to help?
d
@Frank Shen could you share the problem/specific errors you're getting?
f
Hi @David Espejo (he/him), @Marti Jorda Roca, The error for the datacatalog pod is:
Copy code
Warning  FailedMount  2s  kubelet  MountVolume.SetUp failed for volume "aws-secret" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod flyte/datacatalog-5df75c8455-ln79c, err: rpc error: code = Unknown  │
│ desc = us-east-1: Failed fetching secret arn:aws:secretsmanager:us-east-1:178581358138:secret:database/service/mlforge/flyte/int/cyxd/aurorapostgres/v1/migrations-9YrdFD: WebIdentityErr: failed to retrieve credentials                  │
│ caused by: InvalidIdentityToken: No OpenIDConnect provider found in your account for <https://oidc.eks.us-east-1.amazonaws.com/id/235001DED36CBF377DFCFBF426B1CFE6>                                                                          │
│            status code: 400, request id: cb368731-4d56-4649-822b-a89a03784cd9                                                                                                                                                              │
d
@Frank Shen can you verify the OIDC provider for your cluster?
aws eks describe-cluster --region <region> --name <Name-EKS-Cluster> --query "cluster.identity.oidc.issuer" --output text
m
As David says you service account role that the pods is using has a oidc not available
f
The pod’s iam role’s trust relationships is like this:
Copy code
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::178581358138:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/235001DED36CBF377DFCFBF426B1CFE6"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "<http://oidc.eks.us-east-1.amazonaws.com/id/235001DED36CBF377DFCFBF426B1CFE6:aud|oidc.eks.us-east-1.amazonaws.com/id/235001DED36CBF377DFCFBF426B1CFE6:aud>": "<http://sts.amazonaws.com|sts.amazonaws.com>"
        }
      }
    }
  ]
}
And it matches the oidc provider defined.
@David Espejo (he/him), The command aws eks describe-cluster --region us-east-1 --name bolt-dp-us-east-1-int-1-v1 --query “cluster.identity.oidc.issuer” --output text
Copy code
<https://oidc.eks.us-east-1.amazonaws.com/id/235001DED36CBF377DFCFBF426B1CFE6>
@Marti Jorda Roca, see above.
m
it looks good for me. Sry but I don’t know whats wrong here 🥲
f
@David Espejo (he/him), @Marti Jorda Roca, I fixed the above error which is caused by my using the wrong aws account for my env which has nothing to do with the syncing of secret. Now I got a new error in the datacatalog pod:
Copy code
Warning  FailedMount  11s                 kubelet            Unable to attach or mount volumes: unmounted volumes=[db-pass], unattached volumes=[db-pass config-volume kube-api-access-chn8t aws-iam-token aws-secret]: timed out waitin │
g for the condition                                                                                                                                                                                                                        
Warning  FailedMount  7s (x9 over 2m14s)  kubelet            MountVolume.SetUp failed for volume "db-pass" : secret "db-pass" not found
And the pod failed to initialize
My helm chart is like:
Copy code
datacatalog:
            additionalVolumes:
              - name: aws-secret
                csi:
                  driver: secrets-store.csi.k8s.io
                  readOnly: true
                  volumeAttributes:
                    secretProviderClass: flyte-secretproviderclass
            additionalVolumeMounts:
              - name: aws-secret
                mountPath: "/ect/aws-secrets"
                readOnly: true
...
          common:
            ### This secret needs to exist in the flyte namespace prior.
            databaseSecret:
              name: db-pass
              secretManifest:
                apiVersion: secrets-store.csi.x-k8s.io/v1
                kind: SecretProviderClass
                metadata:
                  name: flyte-secretproviderclass
                spec:
                  provider: aws
                  parameters:
                    objects: |
                      - objectName: "{{ .Values.userSettings.dbSecretArn }}"  <- I used SM secret ARN
                        objectType: "secretsmanager"
                        jmesPath:
                            - path: password   <- the secret json do have 'password' as a field.
                              objectAlias: dbpassword
                  # Create k8s secret. It requires volume mount first in the pod and then sync.
                  secretObjects:
                    - secretName: db-pass
                      type: Opaque
                      data:
                        - objectName: dbpassword
                          key: pass.txt   <- this line is different from Marti's original.
Copy code
Name:             datacatalog-7cf54d8fc9-tjjhw
Namespace:        flyte
Priority:         0
Service Account:  datacatalog
Node:             ip-100-72-230-230.ec2.internal/100.72.230.230
Start Time:       Tue, 16 Jan 2024 14:15:48 -0800
Labels:           <http://app.kubernetes.io/instance=flyte|app.kubernetes.io/instance=flyte>
                  <http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
                  <http://app.kubernetes.io/name=datacatalog|app.kubernetes.io/name=datacatalog>
                  <http://helm.sh/chart=flyte-core-3.0.5|helm.sh/chart=flyte-core-3.0.5>
                  pod-template-hash=7cf54d8fc9
Annotations:      configChecksum: 50d02ef3537a5d9f111aa8b8db9e65061685a8d8d1c1c52fdef322c0491ed4a
                  <http://kubectl.kubernetes.io/restartedAt|kubectl.kubernetes.io/restartedAt>: 2024-01-16T14:15:48-08:00
Status:           Pending
IP:               
IPs:              <none>
Controlled By:    ReplicaSet/datacatalog-7cf54d8fc9
Init Containers:
  run-migrations:
    Container ID:  
    Image:         <http://cr.flyte.org/flyteorg/datacatalog:v1.9.37|cr.flyte.org/flyteorg/datacatalog:v1.9.37>
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      datacatalog
      --config
      /etc/datacatalog/config/*.yaml
      migrate
      run
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Environment:
      AWS_STS_REGIONAL_ENDPOINTS:   regional
      AWS_DEFAULT_REGION:           us-east-1
      AWS_REGION:                   us-east-1
      AWS_ROLE_ARN:                 arn:aws:iam::178581358138:role/flyte-role
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    Mounts:
      /etc/datacatalog/config from config-volume (rw)
      /etc/db from db-pass (rw)
      /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gwf5l (ro)
Containers:
  datacatalog:
    Container ID:  
    Image:         <http://cr.flyte.org/flyteorg/datacatalog:v1.9.37|cr.flyte.org/flyteorg/datacatalog:v1.9.37>
    Image ID:      
    Ports:         8088/TCP, 8089/TCP, 10254/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Command:
      datacatalog
      --config
      /etc/datacatalog/config/*.yaml
      serve
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:                1
      ephemeral-storage:  200Mi
      memory:             500Mi
    Requests:
      cpu:                500m
      ephemeral-storage:  200Mi
      memory:             200Mi
    Environment:
      AWS_STS_REGIONAL_ENDPOINTS:   regional
      AWS_DEFAULT_REGION:           us-east-1
      AWS_REGION:                   us-east-1
      AWS_ROLE_ARN:                 arn:aws:iam::178581358138:role/flyte-role
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    Mounts:
      /ect/aws-secrets from aws-secret (ro)
      /etc/datacatalog/config from config-volume (rw)
      /etc/db from db-pass (rw)
      /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gwf5l (ro)
Conditions:
  Type              Status
  Initialized       False 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  aws-iam-token:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  86400
  db-pass:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  db-pass
    Optional:    false
  shared-data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      datacatalog-config
    Optional:  false
  aws-secret:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            <http://secrets-store.csi.k8s.io|secrets-store.csi.k8s.io>
    FSType:            
    ReadOnly:          true
    VolumeAttributes:      secretProviderClass=flyte-secretproviderclass
  kube-api-access-gwf5l:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
                             <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
  Type     Reason       Age                From               Message
  ----     ------       ----               ----               -------
  Normal   Scheduled    42s                default-scheduler  Successfully assigned flyte/datacatalog-7cf54d8fc9-tjjhw to ip-100-72-230-230.ec2.internal
  Warning  FailedMount  10s (x7 over 41s)  kubelet            MountVolume.SetUp failed for volume "db-pass" : secret "db-pass" not found
Hi @Marti Jorda Roca, @David Espejo (he/him), @Kevin Su I don’t understand the purpose of the following under secretManifest:
Copy code
# Create k8s secret. It requires volume mount first in the pod and then sync.
                  secretObjects:
                    - secretName: db-pass
                      type: Opaque
                      data:
                        - objectName: dbpassword
                          key: dbpassword
Could you explain?
It does seem to me that the error
Copy code
MountVolume.SetUp failed for volume "db-pass" : secret "db-pass" not found
is valid, since I changed the definition of common/databaseSecret/name: db-pass and the k8s secret named db-pass will not be created by the flyte charts.
According to this https://secrets-store-csi-driver.sigs.k8s.io/topics/sync-as-kubernetes-secret, I don’t see what I am doing wrong.
This is the helm deployment for datacatalog:
Copy code
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    <http://deployment.kubernetes.io/revision|deployment.kubernetes.io/revision>: "10"
    <http://meta.helm.sh/release-name|meta.helm.sh/release-name>: flyte
    <http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: flyte
  creationTimestamp: "2023-08-11T17:44:54Z"
  generation: 10
  labels:
    <http://app.kubernetes.io/instance|app.kubernetes.io/instance>: flyte
    <http://app.kubernetes.io/managed-by|app.kubernetes.io/managed-by>: Helm
    <http://app.kubernetes.io/name|app.kubernetes.io/name>: datacatalog
    <http://helm.sh/chart|helm.sh/chart>: flyte-core-3.0.5
    <http://helm.toolkit.fluxcd.io/name|helm.toolkit.fluxcd.io/name>: flyte
    <http://helm.toolkit.fluxcd.io/namespace|helm.toolkit.fluxcd.io/namespace>: flyte
  name: datacatalog
  namespace: flyte
  resourceVersion: "1309685335"
  uid: 3c621765-3c6f-4078-a12e-735ec114fe71
spec:
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      <http://app.kubernetes.io/instance|app.kubernetes.io/instance>: flyte
      <http://app.kubernetes.io/name|app.kubernetes.io/name>: datacatalog
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        configChecksum: 50d02ef3537a5d9f111aa8b8db9e65061685a8d8d1c1c52fdef322c0491ed4a
        <http://kubectl.kubernetes.io/restartedAt|kubectl.kubernetes.io/restartedAt>: "2024-01-16T14:15:48-08:00"
      creationTimestamp: null
      labels:
        <http://app.kubernetes.io/instance|app.kubernetes.io/instance>: flyte
        <http://app.kubernetes.io/managed-by|app.kubernetes.io/managed-by>: Helm
        <http://app.kubernetes.io/name|app.kubernetes.io/name>: datacatalog
        <http://helm.sh/chart|helm.sh/chart>: flyte-core-3.0.5
    spec:
      containers:
      - command:
        - datacatalog
        - --config
        - /etc/datacatalog/config/*.yaml
        - serve
        image: <http://cr.flyte.org/flyteorg/datacatalog:v1.9.37|cr.flyte.org/flyteorg/datacatalog:v1.9.37>
        imagePullPolicy: IfNotPresent
        name: datacatalog
        ports:
        - containerPort: 8088
          protocol: TCP
        - containerPort: 8089
          protocol: TCP
        - containerPort: 10254
          protocol: TCP
        resources:
          limits:
            cpu: "1"
            ephemeral-storage: 200Mi
            memory: 500Mi
          requests:
            cpu: 500m
            ephemeral-storage: 200Mi
            memory: 200Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/db
          name: db-pass
        - mountPath: /etc/datacatalog/config
          name: config-volume
        - mountPath: /ect/aws-secrets
          name: aws-secret
          readOnly: true
      dnsPolicy: ClusterFirst
      initContainers:
      - command:
        - datacatalog
        - --config
        - /etc/datacatalog/config/*.yaml
        - migrate
        - run
        image: <http://cr.flyte.org/flyteorg/datacatalog:v1.9.37|cr.flyte.org/flyteorg/datacatalog:v1.9.37>
        imagePullPolicy: IfNotPresent
        name: run-migrations
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/db
          name: db-pass
        - mountPath: /etc/datacatalog/config
          name: config-volume
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 1001
        fsGroupChangePolicy: OnRootMismatch
        runAsUser: 1001
      serviceAccount: datacatalog
      serviceAccountName: datacatalog
      terminationGracePeriodSeconds: 30
      volumes:
      - name: db-pass
        secret:
          defaultMode: 420
          secretName: db-pass
      - emptyDir: {}
        name: shared-data
      - configMap:
          defaultMode: 420
          name: datacatalog-config
        name: config-volume
      - csi:
          driver: <http://secrets-store.csi.k8s.io|secrets-store.csi.k8s.io>
          readOnly: true
          volumeAttributes:
            secretProviderClass: flyte-secretproviderclass
        name: aws-secret
status:
  conditions:
  - lastTransitionTime: "2023-08-11T17:44:54Z"
    lastUpdateTime: "2023-08-11T17:44:54Z"
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  - lastTransitionTime: "2024-01-16T22:25:49Z"
    lastUpdateTime: "2024-01-16T22:25:49Z"
    message: ReplicaSet "datacatalog-7cf54d8fc9" has timed out progressing.
    reason: ProgressDeadlineExceeded
    status: "False"
    type: Progressing
  observedGeneration: 10
  replicas: 3
  unavailableReplicas: 3
  updatedReplicas: 1