• Matheus Moreno

    Matheus Moreno

    1 month ago
    Hey, everyone! I need help with one other thing. So, almost everything works in our (GCP) cluster. The only thing that looks like it isn't working is secret injection. The pod is initialized with annotations
    flyte.secrets/sX
    , but no secret is actually being found by the task. How can I debug this? A little bit more of context: we are injecting environment variables and service account JSONs in the tasks. Because of that, we are actually retrieving secrets directly from
    /etc/flyte/secrets
    , because the
    SecretsManager()
    ends up applying
    upper()
    and
    lower()
    to the keys, which messes up the configuration of the env vars (and files). But apparently no secret is being added to this path...
  • By the way, no environment variables for the secrets (like FLYTE_SECRETS_DEFAULT_DIR) is being set. Maybe it's a problem with the webhook?
  • Yeah, no luck with the secrets being mounted:
    flyte@afdxj58wfblmgn52glcb-n1-0:/$ cd /etc/flyte/secrets
    bash: cd: /etc/flyte/secrets: No such file or directory
    flyte@afdxj58wfblmgn52glcb-n1-0:/$ cd /etc/secrets
    bash: cd: /etc/secrets: No such file or directory
  • Anyone? No luck debugging it 😞
  • k

    katrina

    1 month ago
    @Yee
  • Yee

    Yee

    1 month ago
    your webhook is running right?
  • Matheus Moreno

    Matheus Moreno

    1 month ago
    yes
  • Yee

    Yee

    1 month ago
    anything amiss in the logs?
  • Matheus Moreno

    Matheus Moreno

    1 month ago
    message has been deleted
  • message has been deleted
  • just this
  • this is on flyteadmin
  • and on the console I can see on the task definition that the secrets were requested
  • Yee

    Yee

    1 month ago
    sorry i haven’t played with this before. let me do some more digging
  • Matheus Moreno

    Matheus Moreno

    4 weeks ago
    if you need me to log or run anything please let me know
  • this seems to be the only thing not working in the cluster right now
  • Yee

    Yee

    4 weeks ago
    yeah sorry still digging
  • also digging into the ticket you submitted
  • erm… a while back
  • logger:
      level: 5
      show-source: true
    that is the bit to add for logging
  • were you able to add this logging @Matheus Moreno
  • i added it to my local sandbox while debugging the other issue you raised.
  • those tasks are still working for me, but there are also some red herring error messages in the webhook log. at least i’m assuming they must be, since it’s working
  • {"json":{"src":"secrets.go:54"},"level":"info","msg":"Failed to inject a secret using injector [Global]. Error: secrets not found - Env [FLYTE_SECRET_TEST-GROUP_TEST-ENV], file [/etc/secrets/test-group/test-env]","ts":"2022-07-18T20:03:47Z"}
    {"json":{"src":"secrets.go:54"},"level":"info","msg":"Failed to inject a secret using injector [Global]. Error: secrets not found - Env [FLYTE_SECRET_TEST-GROUP_TEST-FILE], file [/etc/secrets/test-group/test-file]","ts":"2022-07-18T20:03:47Z"}
  • that is what i’m seeing locally in sandbox.
  • but also
    {
      "o0": "Hello world, these are my secrets: TESTING_ENV / TESTING_FILE"
    }
  • so it just writes an error log for all the failed attempts
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    yeah i'm trying it out right now
  • Nothing is appearing in the logs. I don't know if the update actually worked
  • the update = changing the logger level
  • Yee

    Yee

    3 weeks ago
    did you restart the pod?
  • the webhook pod
  • which config did you update?
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    I added this to values.yaml
    configmap:
      logger:
        level: 6
        show-source: true
  • the flyte-propeller-config is updated on my k8s
  • Yee

    Yee

    3 weeks ago
    yeah so on sandbox, my propeller config looks like this after modifying
  • storage.yaml: |
        logger:
          level: 5
          show-source: true
        storage:
          type: minio
          ...
  • it doesn’t matter which yaml you put it in.
  • i just picked storage cuz it was at the bottom of the screen
  • but the webhook pod will need to be restarted after you update the configmap
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    hang on i think i messed it up
  • /etc/flyte/config $ cat logger.yaml 
    level: 6
    show-source: true
    /etc/flyte/config $
  • its like this inside the webhook pod
  • it should be inside a
    logger:
    field right?
  • Yee

    Yee

    3 weeks ago
    yeah
  • it’s “top-level” from propeller’s perspective.
  • which means it’s second level from the perspective of the config map
  • cuz helm/flyte mounts these individual sections as different files in the container
  • in any case, to verify, restart propeller too
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    ok I nested it with another
    logger:
    and did a helm upgrade
  • Yee

    Yee

    3 weeks ago
    propeller has a lot more debug messages, it should be super obvious it worked
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    the new pod is initializing
  • Yee

    Yee

    3 weeks ago
    like it’s so obvious you probably want to turn that back off afterwards
  • for better or for worse the webhook and propeller both use the same propeller command/image/config today
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    yeah it worked! time to run the workflow
  • Yee

    Yee

    3 weeks ago
    perfect
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    nothing... no warnings, not anything
  • Yee

    Yee

    3 weeks ago
    and you restarted the webhook pod too?
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    if the webhook is able to retrieve the secrets, does it log anything?
  • yeah
  • Yee

    Yee

    3 weeks ago
    no it doesn’t log
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    these were the last logs, before I even started the task
  • Yee

    Yee

    3 weeks ago
    you’re using k8s secrets right?
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    yes
  • Yee

    Yee

    3 weeks ago
    yeah we need to add more logging
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    if I look at the console, the command asks for the secrets
  • Yee

    Yee

    3 weeks ago
    and the pod spec doesn’t have any secrets?
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    I just noticed that it asks for 2 secrets twice. this is a problem on my end, but it shouldn't cause this bug, right?
  • Yee

    Yee

    3 weeks ago
    sorry what are you testing again? env or file?
  • or both?
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    yeah
  • file
  • Yee

    Yee

    3 weeks ago
    so when i was testing that gh issue, something like this shows up in the pod
  • - name: orsxg4bnm4zg54lql3	
        secret:	
          defaultMode: 420	
          items:	
          - key: test-file	
            path: test-file	
          secretName: test-group
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    let me enter the pod real quick to confirm that there's nothing mounted
  • oh so there is something in the pod spec?
  • Yee

    Yee

    3 weeks ago
    yeah
  • the webhook alters the podspec before submitting to the pod handler
  • it’s a mutating webhook
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    there's nothing like that in my pod spec. the only mention to secrets is some notes on the metadata
  • metadata:
      annotations:
        <http://cluster-autoscaler.kubernetes.io/safe-to-evict|cluster-autoscaler.kubernetes.io/safe-to-evict>: "false"
        flyte.secrets/s0: ...
        flyte.secrets/s1: ...
        ...
  • there are 6 of those
    flyte.secrets
  • and they all look like hashes
  • Yee

    Yee

    3 weeks ago
    is the
    FLYTE_SECRETS_DEFAULT_DIR
    env var specified?
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    not in the pod spec, no
  • Yee

    Yee

    3 weeks ago
    i see
  • then it’s not hitting that code
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    but that's weird... shouldn't it be automatically set?
  • Yee

    Yee

    3 weeks ago
    no i guess it only sets it if there are file secrets
  • if not it won’t set them, which is okay
  • trying to think of what else to try
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    but what's weird to me is that I request the secrets, they exist, but somehow are not being mounted. maybe the problem is that the
    FLYTE_SECRETS_DEFAULT_DIR
    isn't set, so no secret is mounted?
  • or is the variable set after the secrets are mounted?
  • Yee

    Yee

    3 weeks ago
    what do you mean by they exist
  • like in k8s?
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    yeah, they exist in the cluster
  • I said that because once I had a problem that a task wouldn't start because k8s couldn't find the secrets
  • (that was in the sandbox)
  • Yee

    Yee

    3 weeks ago
    there’s no before or after here
  • it’s just modifying the pod spec…
  • after all the modifications are done, then it gets submitted to k8s for creation
  • trying something locally
  • yeah can you try making your key a
    ""
  • you should still be able to register.
  • and if the webhook is getting called at all, you should trigger this error
  • i can see that log line locally
  • and if you still don’t… then the webhook just isn’t getting called at all
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    ok perfect
  • trying it out, just a sec
  • Yee

    Yee

    3 weeks ago
    actually what do you see when you do
    $ k get mutatingwebhookconfigurations
    NAME                WEBHOOKS   AGE
    flyte-pod-webhook   1          2d23h
  • do you see that?
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    yeah, the pod is running even though no key exists
  • i actually removed the key in two secrets
  • Yee

    Yee

    3 weeks ago
    what does the log say?
  • the webhook log
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    NAME                                                      WEBHOOKS   AGE
    datadog-webhook                                           2          95d
    flyte-pod-webhook                                         1          4d1h
  • i do see the webhook
  • Yee

    Yee

    3 weeks ago
    but no logs?
  • no error message?
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    nothing in the webhook logs. it hasn't logged anything since i started it
  • Yee

    Yee

    3 weeks ago
    so the webhook logic isn’t being called at all
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    weird question: is the namespace "flyte" hardcoded anywhere on flytepropeller?
  • Yee

    Yee

    3 weeks ago
    yeah in one of the things.
  • the leader election config i think
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    we deployed the server on the namespace "ml-flyte". I changed some things
  • the leader election I was able to change
  • Yee

    Yee

    3 weeks ago
    that’s the only one i know of
  • why do you ask?
  • can you do -o yaml on the webhook?
  • i have a ca bundle and this
    service:
          name: flyte-pod-webhook
          namespace: flyte
          path: /mutate--v1-pod
          port: 443
      failurePolicy: Ignore
      matchPolicy: Equivalent
      name: <http://flyte-pod-webhook.flyte.org|flyte-pod-webhook.flyte.org>
      namespaceSelector: {}
      objectSelector:
        matchLabels:
          inject-flyte-secrets: "true"
      reinvocationPolicy: Never
      rules:
      - apiGroups:
        - '*'
        apiVersions:
        - v1
        operations:
        - CREATE
        resources:
        - pods
        scope: '*'
      sideEffects: NoneOnDryRun
      timeoutSeconds: 10
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    our server is deployed on a "ml-flyte" namespace
  • Yee

    Yee

    3 weeks ago
    what do you see?
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    not "flyte"
  • Yee

    Yee

    3 weeks ago
    that should be fine
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    can you send me the whole -o yaml command? should be a describe pod?
  • Yee

    Yee

    3 weeks ago
    kubectl get mutatingwebhookconfigurations flyte-pod-webhook -o yaml
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    service:
          name: flyte-pod-webhook
          namespace: ml-dev
          path: /mutate--v1-pod
          port: 443
      failurePolicy: Ignore
      matchPolicy: Equivalent
      name: <http://flyte-pod-webhook.flyte.org|flyte-pod-webhook.flyte.org>
      namespaceSelector: {}
      objectSelector:
        matchLabels:
          inject-flyte-secrets: "true"
      reinvocationPolicy: Never
      rules:
      - apiGroups:
        - '*'
        apiVersions:
        - v1
        operations:
        - CREATE
        resources:
        - pods
        scope: '*'
      sideEffects: NoneOnDryRun
      timeoutSeconds: 10
  • do you want anything before the caBundle?
  • Yee

    Yee

    3 weeks ago
    ml-dev
    ?
  • not
    ml-flyte
    ?
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    ops sorry, that was the one from the dev namespace
  • hang on
  • service:
          name: flyte-pod-webhook
          namespace: ml-flyte
          path: /mutate--v1-pod
          port: 443
      failurePolicy: Ignore
      matchPolicy: Equivalent
      name: <http://flyte-pod-webhook.flyte.org|flyte-pod-webhook.flyte.org>
      namespaceSelector: {}
      objectSelector:
        matchLabels:
          inject-flyte-secrets: "true"
      reinvocationPolicy: Never
      rules:
      - apiGroups:
        - '*'
        apiVersions:
        - v1
        operations:
        - CREATE
        resources:
        - pods
        scope: '*'
      sideEffects: NoneOnDryRun
      timeoutSeconds: 10
  • same thing basically
  • Yee

    Yee

    3 weeks ago
    weird, i didn’t think they were a namespaced resource at all
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    maybe that's the issue?
  • Yee

    Yee

    3 weeks ago
    no i dont’ think so
  • that’s what i see on my end.
  • and it’s working, at least for the sandbox
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    No plugin found for Handler-type [python-task], defaulting to [container],
  • can this be anything?
  • it's in the flytepropeller logs.
  • Yee

    Yee

    3 weeks ago
    nah that’s the correct behavior
  • need a way to look at kubeapi logs
  • meeting brb
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    ok I'll be looking into it
  • Yee

    Yee

    3 weeks ago
    and you have
    inject-flyte-secrets: "true"
    in your task pod labels right?
  • and what apiversion are your pods?
  • apiVersion: v1
    kind: Pod
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    yes, and v1 versions
  • there are no propeller logs for when the webhook fails right?
  • Yee

    Yee

    3 weeks ago
    no…
  • i mean there are
  • but like i think that would be pretty obvious.
  • i don’t think the issue is the webhook failing, i think it’s not being called at all
  • @jeev have you used secrets at all on gcp?
  • flyte secrets that is
  • or webhooks in general
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    yeah sorry, I was thinking about it not being called, if there was any way to check that
  • j

    jeev

    3 weeks ago
    no we don't use the secret webhook. we just provision the namespaces with the appropriate secrets in place.
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    wait... how does that work?
  • like, the entire reason we want to use the webhook is to inject sensitive data into our containers (like MLflow credentials). I thought the only way to do that was using the webhook and
    secret_requests
    . Is there another way?
  • by the way, all of our Flyte projects share the same namespace,
    ml-flyte-projects
    .
  • j

    jeev

    3 weeks ago
    we pre-create the namespace, add secrets to it by any method of your choice, then just tell flyte to mount the secrets in as env vars or file via the pod spec. this can be done as a sidecar job or via pod templates in flytepropeller
  • just manage the namespace/secrets outside of flyte basically
  • Yee

    Yee

    3 weeks ago
    and jeev did you do that cuz secrets weren’t there at the time? or because you had issues getting them to work?
  • j

    jeev

    3 weeks ago
    this was our preferred design - declarative and easily reproducible.
  • Yee

    Yee

    3 weeks ago
    got it, cool
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    @jeev oh, cool! that's perfect! so, just to be clear, the pod template config you're referring to is this one, right? https://docs.flyte.org/en/stable/deployment/cluster_config/flytepropeller_config.html#default-pod-template-name-string you should set it like this?
    configmap:
      k8s:
        k8s:
          default-pod-template-name: <PodTemplate created in the same namespace as FlytePropeller>
  • the sidecar job is created with the sidecar plugin, right?
  • j

    jeev

    3 weeks ago
    right. we haven't used pod templates there, but it was meant to support this exact use case - platform-defined default for pod spec. we’re currently using sidecar tasks to define the pod spec.
  • we will be migrating to pod templates!
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    ok, perfect! I'll try to use it and I'll let both of you know. thank you very much for your patience! 😄
  • Hey, everyone! Good morning! Quick question about pod templates. Do I have to specify everything for a specific template, or is it like an override for the default Flyte template?
  • j

    jeev

    3 weeks ago
    not 100% sure on this. @Dan Rammer (hamersaw) can speak to it. though i do know that the
    PodTemplate
    object still has to be valid, so many of the fields will be required.
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    Ok, I'll look into it. @Yee, if you want to continue debugging what's going on later, I'll be available. Looking at how to use pod templates, it seems that I cannot limit when a secret should be mounted (@jeev can correct me if I'm wrong), so I think the webhook is a more granular option.
  • j

    jeev

    3 weeks ago
    @Matheus Moreno: correct. this would be for a project-specific default pod spec. this is more of freenome's use case.
  • Yee

    Yee

    3 weeks ago
    i’m down to do a screenshare if you want. now’ish?
  • i don’t have any more information, but maybe looking at the gcp console might provide some more clues
  • Dan Rammer (hamersaw)

    Dan Rammer (hamersaw)

    3 weeks ago
    @Matheus Moreno the default PodTemplate docs issue provides a little more context. Basically, the PodTemplate is used as a base for all k8s Pods to be built on. So we start with the PodTemplate, layer k8s plugin configuration, and then layer the specific task configuration. So you can specify as little as you want.
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    Hi, everyone! Sorry for the late reply. I'm very happy to inform that I was able to inject secrets in a more granular manner using a Pod Task specification. I made a very simple function, similar to the example in the documentation, that creates a
    V1SecretVolumeSource
    and a
    V1VolumeMount
    with the required secrets. Since it's a sidecar task, I believe it's something similar to what Jeev is doing right now. For now, it works perfectly! 🙏
  • Yee

    Yee

    3 weeks ago
    hey would you mind pasting the labels and annotations for one of the task pods again?
  • just want to verify that those are being set correctly for the webhook to pick up on.
  • like in your annotation you see something like
    flyte.secrets/s0: m4zg54lqhiqce4dfon1c1z2sn41xaiqknnsxsoraej1gk32ufvsw34rcbjww54loorpxezlrovuxezlnmvxhioraivhfmx1wifjau
    right?
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    yeah, those are being set up
  • and there's also an annotation with "inject-flyte-secrets: true"
  • Yee

    Yee

    3 weeks ago
    okay… then i suspect something is amiss on the gke side.
  • sorry 😞 but it’ll be a bit before we can investigate mroe
  • i think something is happening on the gke side that is preventing these webhooks from running
  • unless you’re seeing other webhooks go through?
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    Maybe that's it... I don't know if other webhooks are running on our cluster. I only have access to certain namespaces
  • Yee

    Yee

    3 weeks ago
    well if you want to pursue this, is this something you can ask around internally with the team that set up the cluster?
  • Matheus Moreno

    Matheus Moreno

    3 weeks ago
    sure, I can talk to them. A friend of mine is actually the admin of the cluster
  • I'll let you know what I could find
  • Yee

    Yee

    3 weeks ago
    yes please, thanks