https://flyte.org logo
#ask-the-community
Title
# ask-the-community
f

Frank Shen

03/15/2023, 6:08 PM
However, when I test with a key that doesn’t exist, the task is stuck in initializing state forever. Does anyone have an idea how to prevent this. I’d rather the task fails immediately. CC@Kamakshi Muthukrishnan
Copy code
@task(secret_requests=[Secret(group=FEATHR_SECRET_GROUP, key='xxx'), Secret(group=FEATHR_SECRET_GROUP, key='yyy')])
def get_feathr_s3_secrets() -> Tuple[str, str]:
    context = current_context()
    s3_access_key = context.secrets.get(FEATHR_SECRET_GROUP, 'xxx')
    s3_secret_key = context.secrets.get(FEATHR_SECRET_GROUP, 'yyy')
n

Niels Bantilan

03/15/2023, 6:10 PM
hi Frank, this is an issue I just noticed today… would you mind opening up a [flyte-bug] issue for this? 👇
k

Kevin Su

03/15/2023, 6:13 PM
which version of propeller are you using? we have changed the default webhook failure policy to fail, so the task should fail immediately if secret not found. https://github.com/flyteorg/flytepropeller/pull/473
f

Frank Shen

03/15/2023, 6:45 PM
@Kevin Su, what version of propeller is this fix in?
We will upgrade to that version.
k

Kevin Su

03/15/2023, 6:46 PM
v1.1.34+
f

Frank Shen

03/15/2023, 6:49 PM
Thanks @Kevin Su!
@Kevin Su, I confirmed that our Propeller is already 1.1.38. However, the task still hangs. What could be the reason?
k

Kevin Su

03/15/2023, 9:49 PM
cc @Dan Rammer (hamersaw)
d

Dan Rammer (hamersaw)

03/16/2023, 5:14 PM
@Frank Shen can you dump the k8s mutating webhook? Something like
kubectl -n flyte get mutatingwebhookconfigurations flyte-sandbox-local -o yaml
should get this.
Also, looking at the Pod would be useful as well
kubectl -n flytesnacks-development get pod <pod-name> -o yaml
f

Frank Shen

03/16/2023, 5:22 PM
Hi @Kamakshi Muthukrishnan, could you please get Dan Rammer the information he requested to help trouble shooting the issue with incorrect key name hanging the task?
k

karthikraj

03/16/2023, 6:15 PM
@Dan Rammer (hamersaw)
Copy code
]$ kubectl -n flyte get mutatingwebhookconfigurations flyte-sandbox-local -o yaml
Error from server (NotFound): <http://mutatingwebhookconfigurations.admissionregistration.k8s.io|mutatingwebhookconfigurations.admissionregistration.k8s.io> "flyte-sandbox-local" not found
But, by running without
flyte-sandbox-local
, getting as below.
Copy code
]$ kubectl -n flyte get mutatingwebhookconfigurations
NAME                            WEBHOOKS   AGE
aws-load-balancer-webhook       2          275d
datadog-webhook                 2          269d
flyte-pod-webhook               1          268d
istio-sidecar-injector          4          265d
pod-identity-webhook            1          277d
vpc-resource-mutating-webhook   1          277d
The below works too.
Copy code
kubectl -n flyte get mutatingwebhookconfigurations -o yaml
@Frank Shen I would request you to personally ping me the workflow link that falls into the above scenario. I need to get the respective pod id to run the kubectl command as @Dan Rammer (hamersaw) mentioned.
d

Dan Rammer (hamersaw)

03/16/2023, 6:33 PM
can you replace the
flyte-sandbox-local
with
flyte-pod-webhook
?
k

karthikraj

03/16/2023, 8:05 PM
@Dan Rammer (hamersaw) Here is the output:
Copy code
apiVersion: <http://admissionregistration.k8s.io/v1|admissionregistration.k8s.io/v1>
kind: MutatingWebhookConfiguration
metadata:
  creationTimestamp: "2022-06-20T18:29:21Z"
  generation: 177
  name: flyte-pod-webhook
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: flyte-pod-webhook-756bd668
    uid: 7512f77a-b259-4d45-87e7-5e8d24585706
  resourceVersion: "198242460"
  uid: e4643e72-f5ce-46e2-a584-a4d95e5c0f44
webhooks:
- admissionReviewVersions:
  - v1
  - v1beta1
  clientConfig:
    caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUZCVENDQXUyZ0F3SUJBZ0lDQitVd0RRWUpLb1pJaHZjTkFRRUxCUUF3RkRFU01CQUdBMVVFQ2hNSlpteDUKZEdVdWIzSm5NQjRYRFRJek1ETXhOREUxTkRrd05sb1hEVEkwTURNeE5ERTFORGt3Tmxvd0ZERVNNQkFHQTFVRQpDaE1KWm14NWRHVXViM0puTUlJQ0lqQU5CZ2txaGtpRzl3MEJBUUVGQUFPQ0FnOEFNSUlDQ2dLQ0FnRUF6andvCjRLTTFaWExWRDVMUjNOazdwb0RkaHQxUkpSd1V5NVNDN1ZIVFBVZHZKalNYeFF1MEp6Z1FPUzh3dDF4QkRwdTYKOWZIdklkRGpCN3JJRGxLdXhrcVloNmM5ellhNmVvWWhkSU92b2E1dkpXSDFwR3FpYjJSQ1E1WmZZSllGeElBZwpjd3JVZmozQ1dCandyUkZYVFFXS3h4aXdiUGhsZWJsWWhnc3lWVCs0Z3VpTzk0Y1FrT1FSZENmMTNRaXNVTTk0CnpicnplSmhjTythNTJPbGgybERLT2lLamxUYWRwTUp1dDhSaDMxZnZER204YnpCYkhnQmZLNHdaSUpMYXhMR0UKaDdXZURlTGcvMlpvVi9yTUIydTVvZ0h2WEJVUTVIZVQwNzRDalBKalEyeENiVGNVSFpBcWd3MlVwOEpVYjFlTgpKVVhkU3phcG50WldTUVpwakVyZm9ld040dnkrdUZwUTFNRW1CekFkYXRqMmIxQTM3amk3UXZkSXlUUGFCVC9CClFoei95cXVQT05oOVVzTEhwNjVPWnVBZ1pGeGg0ZlNudlZqVWdaN2E0M3RBVC85YmRDbk5nQUZ6MjBJbjM1aXgKUmZ4NmlNSG5BWHpHNnZaaUpCd3dmUVVpaTkrUFV5dEJ6NUZiL2N0cmhaNVZGenZZNjVHRVhqWXRVV3htSG9pbAprMVpDTk9nci9wWXRUS01FQ0pScGVQWEx1RjExS25KL3o5ZlJzenFHMmtYY1RoMlBNdTVNYlI3ZG8zT01HT1pXCjVSZ0tIVXJtMkhOQVpUR3h6cEl4aDB3Z3d4RFJPZWJNY3VGblYwNWM2d2pNTGYxRlM1NUdOQnlSKzBVbi9NL0wKemxiZW5WVGNxYXdhRkNwZG9xS3MzTy9uaGxBNkRnRkZuR3RsbkNVQ0F3RUFBYU5oTUY4d0RnWURWUjBQQVFILwpCQVFEQWdLRU1CMEdBMVVkSlFRV01CUUdDQ3NHQVFVRkJ3TUNCZ2dyQmdFRkJRY0RBVEFQQmdOVkhSTUJBZjhFCkJUQURBUUgvTUIwR0ExVWREZ1FXQkJSNFA3SDBZQm5kdzhvWkZ6V2l4WW80SVZVRndUQU5CZ2txaGtpRzl3MEIKQVFzRkFBT0NBZ0VBZGRRNlJZaFo2NlJGclFXaHJES0s4UXFVU2FzRDBMbDFwMzBraVEzYVJONDJaQnR0L1BYeQpJa2t1Z0VuaTk2K2I0SWkzUDhQNHZTM082d1FzYThqMGhhbEFWVDI5bW9GNjVPWlhIbm1lSzA5N3dKZmZlQk1LCk1ORWNGaW8zRHdFWFVPbCtpRWgzZGk2WDkrVHA2Lzg4dzJNZG1PM01MWXRsSHBVNkMveEF2WUYvaFp3cmJjZjYKVXE0TUJUbUcrcnJ6YWZ4RU9FUllZeTdSM2s0RzRORUIxdFBVVVFyR2swZDFxeDhmQ3N3ak5RUklFWlRzajE5cQowQUxWYVdneTN5cys2dnVlMVB0d3pLU05MM0xuaGFYZ3hrb0dReHVBVkdoT21OK2YzUU9ENUFkTUhsNHZKeVZ2CnhISVdvRFdFbXoyZDBKTFpjRmg1OWFZVlo5VFVDbkJ3cFdwOVdtMEFLdXlYaXBHbk0vaW51cmpvYTRRdEJPbksKUzRVektlc3dCOG9GK093SXZIZWpVc2hyemRFOGM4dTZsbWhkQStWVHRhTXNDeTJ5Qm5JajFvZFEvcjlkR3JkLwpxc0tNOWdXVlFRSTV1d3NPT1l6ZjVNaWEzaTNxOHN6QlNIV3hpNGZ4WDl0MSs4R0NRQ3F5WkpmQXFUMWVKZEFTCms3WmpJaW5Oa2NLcVRSSnluYXYxeDVzd1cvUjE5cGg4eTh5dWhIazJmVjdTVFU2VzRJTnVZUTRKNWVOMXA5WXEKSFNxdXBZS3c2K2RJMjBpQkxod1JTaWNYc1o2NFA4V0VvWUwvL2Y5VnA2cXVYRjFhU1pLY1REMzV6M1FTd0d6SQpBOUpBWEkvWUJMT3U4U04yL24vVVQ3NHRsalpKdnY2MDBHWHpBdWFmZ2d4N0RyK0liTlZQNlYwPQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
    service:
      name: flyte-pod-webhook
      namespace: flyte
      path: /mutate--v1-pod
      port: 443
  failurePolicy: Fail
  matchPolicy: Equivalent
  name: <http://flyte-pod-webhook.flyte.org|flyte-pod-webhook.flyte.org>
  namespaceSelector: {}
  objectSelector:
    matchLabels:
      inject-flyte-secrets: "true"
  reinvocationPolicy: Never
  rules:
  - apiGroups:
    - '*'
    apiVersions:
    - v1
    operations:
    - CREATE
    resources:
    - pods
    scope: '*'
  sideEffects: NoneOnDryRun
  timeoutSeconds: 10
d

Dan Rammer (hamersaw)

03/16/2023, 10:28 PM
Ok, this looks correct. So the Pod is being created with a secrets request for secret that does not exist. Can you dump the pod in the same way?
k

karthikraj

03/17/2023, 9:09 PM
@Dan Rammer (hamersaw) I have posted below the pod info as requested. The pod stuck in ContainerCreating state forever
Copy code
fefef3595b7db4491bde-n0-0          0/1     ContainerCreating   0          8m17s
Copy code
]$ kubectl get pod fefef3595b7db4491bde-n0-0 -n flytesnacks-development -o yaml

apiVersion: v1
kind: Pod
metadata:
  annotations:
    <http://cluster-autoscaler.kubernetes.io/safe-to-evict|cluster-autoscaler.kubernetes.io/safe-to-evict>: "false"
    flyte.secrets/s0: m4zg54lqhiqceztfmf1gq3rcbjxxxxj1earhq5dyeifa
    flyte.secrets/s1: m4zg54lqhiqceztfmf1gq3rcbjxxxxj1earhs5lzeifa
    <http://kubernetes.io/psp|kubernetes.io/psp>: eks.privileged
  creationTimestamp: "2023-03-17T20:59:07Z"
  labels:
    domain: development
    execution-id: fefef3595b7db4491bde
    inject-flyte-secrets: "true"
    interruptible: "false"
    node-id: n0
    project: flytesnacks
    shard-key: "5"
    task-name: ml-project-1-test-aws-secrets-get-feathr-s3-secrets
    workflow-name: ml-project-1-test-aws-secrets-wf
  name: fefef3595b7db4491bde-n0-0
  namespace: flytesnacks-development
  ownerReferences:
  - apiVersion: <http://flyte.lyft.com/v1alpha1|flyte.lyft.com/v1alpha1>
    blockOwnerDeletion: true
    controller: true
    kind: flyteworkflow
    name: fefef3595b7db4491bde
    uid: 6a7705a2-92d2-4371-8055-e6fe10d94081
  resourceVersion: "200937396"
  uid: 30d7f362-db6d-40a2-8d99-9a05e6f2cd54
spec:
  containers:
  - args:
    - pyflyte-fast-execute
    - --additional-distribution
    - <s3://dev-wm-max-ml-flyte-us-east-1/zg/flytesnacks/development/OLE435OTTPQTX3Q5NIXHMARUPM======/scriptmode.tar.gz>
    - --dest-dir
    - /root
    - --
    - pyflyte-execute
    - --inputs
    - <s3://dev-wm-max-ml-flyte-us-east-1/metadata/propeller/flytesnacks-development-fefef3595b7db4491bde/n0/data/inputs.pb>
    - --output-prefix
    - <s3://dev-wm-max-ml-flyte-us-east-1/metadata/propeller/flytesnacks-development-fefef3595b7db4491bde/n0/data/0>
    - --raw-output-data-prefix
    - <s3://dev-wm-max-ml-flyte-us-east-1/j5/fefef3595b7db4491bde-n0-0>
    - --checkpoint-path
    - <s3://dev-wm-max-ml-flyte-us-east-1/j5/fefef3595b7db4491bde-n0-0/_flytecheckpoints>
    - --prev-checkpoint
    - '""'
    - --resolver
    - flytekit.core.python_auto_container.default_task_resolver
    - --
    - task-module
    - ml_project_1.test_aws_secrets
    - task-name
    - get_feathr_s3_secrets
    env:
    - name: FLYTE_INTERNAL_EXECUTION_WORKFLOW
      value: flytesnacks:development:<http://ml_project_1.test_aws_secrets.wf|ml_project_1.test_aws_secrets.wf>
    - name: FLYTE_INTERNAL_EXECUTION_ID
      value: fefef3595b7db4491bde
    - name: FLYTE_INTERNAL_EXECUTION_PROJECT
      value: flytesnacks
    - name: FLYTE_INTERNAL_EXECUTION_DOMAIN
      value: development
    - name: FLYTE_ATTEMPT_NUMBER
      value: "0"
    - name: FLYTE_INTERNAL_TASK_PROJECT
      value: flytesnacks
    - name: FLYTE_INTERNAL_TASK_DOMAIN
      value: development
    - name: FLYTE_INTERNAL_TASK_NAME
      value: ml_project_1.test_aws_secrets.get_feathr_s3_secrets
    - name: FLYTE_INTERNAL_TASK_VERSION
      value: MIa3ccihed0f1-HxhO3mvw==
    - name: FLYTE_INTERNAL_PROJECT
      value: flytesnacks
    - name: FLYTE_INTERNAL_DOMAIN
      value: development
    - name: FLYTE_INTERNAL_NAME
      value: ml_project_1.test_aws_secrets.get_feathr_s3_secrets
    - name: FLYTE_INTERNAL_VERSION
      value: MIa3ccihed0f1-HxhO3mvw==
    - name: DEFAULT_ENV_VAR
      value: VALUE
    - name: MY_NAME
      value: KARTHIKRAJ
    - name: FLYTE_SECRETS_DEFAULT_DIR
      value: /etc/flyte/secrets
    - name: FLYTE_SECRETS_FILE_PREFIX
    - name: AWS_DEFAULT_REGION
      value: us-east-1
    - name: AWS_REGION
      value: us-east-1
    - name: AWS_ROLE_ARN
      value: arn:aws:iam::613630599026:role/dev-xxx-xx-platform-xxxx-role
    - name: AWS_WEB_IDENTITY_TOKEN_FILE
      value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    image: <http://ghcr.io/flyteorg/flytekit:py3.8-1.2.4|ghcr.io/flyteorg/flytekit:py3.8-1.2.4>
    imagePullPolicy: IfNotPresent
    name: fefef3595b7db4491bde-n0-0
    resources:
      limits:
        cpu: "2"
        memory: 20Gi
      requests:
        cpu: "2"
        memory: 20Gi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: FallbackToLogsOnError
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-k96xl
      readOnly: true
    - mountPath: /etc/flyte/secrets/feathr
      name: mzswc4diojpq
      readOnly: true
    - mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
      name: aws-iam-token
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: ip-10-69-46-21.ec2.internal
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: <http://node.kubernetes.io/not-ready|node.kubernetes.io/not-ready>
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: <http://node.kubernetes.io/unreachable|node.kubernetes.io/unreachable>
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: aws-iam-token
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          audience: <http://sts.amazonaws.com|sts.amazonaws.com>
          expirationSeconds: 86400
          path: token
  - name: kube-api-access-k96xl
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
  - name: mzswc4diojpq
    secret:
      defaultMode: 420
      items:
      - key: xxx
        path: xxx
      - key: yyy
        path: yyy
      secretName: feathr
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-03-17T20:59:07Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2023-03-17T20:59:07Z"
    message: 'containers with unready status: [fefef3595b7db4491bde-n0-0]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2023-03-17T20:59:07Z"
    message: 'containers with unready status: [fefef3595b7db4491bde-n0-0]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2023-03-17T20:59:07Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - image: <http://ghcr.io/flyteorg/flytekit:py3.8-1.2.4|ghcr.io/flyteorg/flytekit:py3.8-1.2.4>
    imageID: ""
    lastState: {}
    name: fefef3595b7db4491bde-n0-0
    ready: false
    restartCount: 0
    started: false
    state:
      waiting:
        reason: ContainerCreating
  hostIP: 10.69.46.21
  phase: Pending
  qosClass: Guaranteed
  startTime: "2023-03-17T20:59:07Z"
d

Dan Rammer (hamersaw)

03/20/2023, 4:15 PM
OK, @karthikraj everything looks correct. It seems this is just the default behavior of k8s - namely to wait until the secret exists to start the container. Do you mind filing an issue for this? [flyte-feature] I'm wondering if there is a configuration option we can set on the pods to make then fail immediately rather than being stuck in an initializing state.
79 Views