Hey, everyone! I need help with one other thing. S...
# flyte-deployment
m
Hey, everyone! I need help with one other thing. So, almost everything works in our (GCP) cluster. The only thing that looks like it isn't working is secret injection. The pod is initialized with annotations
flyte.secrets/sX
, but no secret is actually being found by the task. How can I debug this? A little bit more of context: we are injecting environment variables and service account JSONs in the tasks. Because of that, we are actually retrieving secrets directly from
/etc/flyte/secrets
, because the
SecretsManager()
ends up applying
upper()
and
lower()
to the keys, which messes up the configuration of the env vars (and files). But apparently no secret is being added to this path...
By the way, no environment variables for the secrets (like FLYTE_SECRETS_DEFAULT_DIR) is being set. Maybe it's a problem with the webhook?
Yeah, no luck with the secrets being mounted:
Copy code
flyte@afdxj58wfblmgn52glcb-n1-0:/$ cd /etc/flyte/secrets
bash: cd: /etc/flyte/secrets: No such file or directory
flyte@afdxj58wfblmgn52glcb-n1-0:/$ cd /etc/secrets
bash: cd: /etc/secrets: No such file or directory
Anyone? No luck debugging it 😞
k
@Yee
y
your webhook is running right?
m
yes
y
anything amiss in the logs?
m
message has been deleted
message has been deleted
just this
this is on flyteadmin
and on the console I can see on the task definition that the secrets were requested
y
sorry i haven’t played with this before. let me do some more digging
m
if you need me to log or run anything please let me know
this seems to be the only thing not working in the cluster right now
y
yeah sorry still digging
also digging into the ticket you submitted
erm… a while back
Copy code
logger:
  level: 5
  show-source: true
that is the bit to add for logging
were you able to add this logging @Matheus Moreno
i added it to my local sandbox while debugging the other issue you raised.
those tasks are still working for me, but there are also some red herring error messages in the webhook log. at least i’m assuming they must be, since it’s working
Copy code
{"json":{"src":"secrets.go:54"},"level":"info","msg":"Failed to inject a secret using injector [Global]. Error: secrets not found - Env [FLYTE_SECRET_TEST-GROUP_TEST-ENV], file [/etc/secrets/test-group/test-env]","ts":"2022-07-18T20:03:47Z"}
{"json":{"src":"secrets.go:54"},"level":"info","msg":"Failed to inject a secret using injector [Global]. Error: secrets not found - Env [FLYTE_SECRET_TEST-GROUP_TEST-FILE], file [/etc/secrets/test-group/test-file]","ts":"2022-07-18T20:03:47Z"}
that is what i’m seeing locally in sandbox.
but also
Copy code
{
  "o0": "Hello world, these are my secrets: TESTING_ENV / TESTING_FILE"
}
so it just writes an error log for all the failed attempts
m
yeah i'm trying it out right now
Nothing is appearing in the logs. I don't know if the update actually worked
the update = changing the logger level
y
did you restart the pod?
the webhook pod
which config did you update?
m
I added this to values.yaml
Copy code
configmap:
  logger:
    level: 6
    show-source: true
the flyte-propeller-config is updated on my k8s
y
yeah so on sandbox, my propeller config looks like this after modifying
Copy code
storage.yaml: |
    logger:
      level: 5
      show-source: true
    storage:
      type: minio
      ...
it doesn’t matter which yaml you put it in.
i just picked storage cuz it was at the bottom of the screen
but the webhook pod will need to be restarted after you update the configmap
m
hang on i think i messed it up
Copy code
/etc/flyte/config $ cat logger.yaml 
level: 6
show-source: true
/etc/flyte/config $
its like this inside the webhook pod
it should be inside a
logger:
field right?
y
yeah
it’s “top-level” from propeller’s perspective.
which means it’s second level from the perspective of the config map
cuz helm/flyte mounts these individual sections as different files in the container
in any case, to verify, restart propeller too
m
ok I nested it with another
logger:
and did a helm upgrade
y
propeller has a lot more debug messages, it should be super obvious it worked
m
the new pod is initializing
y
like it’s so obvious you probably want to turn that back off afterwards
for better or for worse the webhook and propeller both use the same propeller command/image/config today
m
yeah it worked! time to run the workflow
y
perfect
m
nothing... no warnings, not anything
y
and you restarted the webhook pod too?
m
if the webhook is able to retrieve the secrets, does it log anything?
yeah
y
no it doesn’t log
m
these were the last logs, before I even started the task
y
you’re using k8s secrets right?
m
yes
y
yeah we need to add more logging
m
if I look at the console, the command asks for the secrets
y
and the pod spec doesn’t have any secrets?
m
I just noticed that it asks for 2 secrets twice. this is a problem on my end, but it shouldn't cause this bug, right?
y
sorry what are you testing again? env or file?
or both?
m
yeah
file
y
so when i was testing that gh issue, something like this shows up in the pod
Copy code
- name: orsxg4bnm4zg54lql3	
    secret:	
      defaultMode: 420	
      items:	
      - key: test-file	
        path: test-file	
      secretName: test-group
m
let me enter the pod real quick to confirm that there's nothing mounted
oh so there is something in the pod spec?
y
yeah
the webhook alters the podspec before submitting to the pod handler
it’s a mutating webhook
m
there's nothing like that in my pod spec. the only mention to secrets is some notes on the metadata
Copy code
metadata:
  annotations:
    <http://cluster-autoscaler.kubernetes.io/safe-to-evict|cluster-autoscaler.kubernetes.io/safe-to-evict>: "false"
    flyte.secrets/s0: ...
    flyte.secrets/s1: ...
    ...
there are 6 of those
flyte.secrets
and they all look like hashes
y
is the
FLYTE_SECRETS_DEFAULT_DIR
env var specified?
m
not in the pod spec, no
y
i see
then it’s not hitting that code
m
but that's weird... shouldn't it be automatically set?
y
no i guess it only sets it if there are file secrets
if not it won’t set them, which is okay
trying to think of what else to try
m
but what's weird to me is that I request the secrets, they exist, but somehow are not being mounted. maybe the problem is that the
FLYTE_SECRETS_DEFAULT_DIR
isn't set, so no secret is mounted?
or is the variable set after the secrets are mounted?
y
what do you mean by they exist
like in k8s?
m
yeah, they exist in the cluster
I said that because once I had a problem that a task wouldn't start because k8s couldn't find the secrets
(that was in the sandbox)
y
there’s no before or after here
it’s just modifying the pod spec…
after all the modifications are done, then it gets submitted to k8s for creation
trying something locally
yeah can you try making your key a
""
you should still be able to register.
and if the webhook is getting called at all, you should trigger this error
i can see that log line locally
and if you still don’t… then the webhook just isn’t getting called at all
m
ok perfect
trying it out, just a sec
y
actually what do you see when you do
Copy code
$ k get mutatingwebhookconfigurations
NAME                WEBHOOKS   AGE
flyte-pod-webhook   1          2d23h
do you see that?
m
yeah, the pod is running even though no key exists
i actually removed the key in two secrets
y
what does the log say?
the webhook log
m
Copy code
NAME                                                      WEBHOOKS   AGE
datadog-webhook                                           2          95d
flyte-pod-webhook                                         1          4d1h
i do see the webhook
y
but no logs?
no error message?
m
nothing in the webhook logs. it hasn't logged anything since i started it
y
so the webhook logic isn’t being called at all
m
weird question: is the namespace "flyte" hardcoded anywhere on flytepropeller?
y
yeah in one of the things.
the leader election config i think
m
we deployed the server on the namespace "ml-flyte". I changed some things
the leader election I was able to change
y
that’s the only one i know of
why do you ask?
can you do -o yaml on the webhook?
i have a ca bundle and this
Copy code
service:
      name: flyte-pod-webhook
      namespace: flyte
      path: /mutate--v1-pod
      port: 443
  failurePolicy: Ignore
  matchPolicy: Equivalent
  name: <http://flyte-pod-webhook.flyte.org|flyte-pod-webhook.flyte.org>
  namespaceSelector: {}
  objectSelector:
    matchLabels:
      inject-flyte-secrets: "true"
  reinvocationPolicy: Never
  rules:
  - apiGroups:
    - '*'
    apiVersions:
    - v1
    operations:
    - CREATE
    resources:
    - pods
    scope: '*'
  sideEffects: NoneOnDryRun
  timeoutSeconds: 10
m
our server is deployed on a "ml-flyte" namespace
y
what do you see?
m
not "flyte"
y
that should be fine
m
can you send me the whole -o yaml command? should be a describe pod?
y
Copy code
kubectl get mutatingwebhookconfigurations flyte-pod-webhook -o yaml
m
Copy code
service:
      name: flyte-pod-webhook
      namespace: ml-dev
      path: /mutate--v1-pod
      port: 443
  failurePolicy: Ignore
  matchPolicy: Equivalent
  name: <http://flyte-pod-webhook.flyte.org|flyte-pod-webhook.flyte.org>
  namespaceSelector: {}
  objectSelector:
    matchLabels:
      inject-flyte-secrets: "true"
  reinvocationPolicy: Never
  rules:
  - apiGroups:
    - '*'
    apiVersions:
    - v1
    operations:
    - CREATE
    resources:
    - pods
    scope: '*'
  sideEffects: NoneOnDryRun
  timeoutSeconds: 10
do you want anything before the caBundle?
y
ml-dev
?
not
ml-flyte
?
m
ops sorry, that was the one from the dev namespace
hang on
Copy code
service:
      name: flyte-pod-webhook
      namespace: ml-flyte
      path: /mutate--v1-pod
      port: 443
  failurePolicy: Ignore
  matchPolicy: Equivalent
  name: <http://flyte-pod-webhook.flyte.org|flyte-pod-webhook.flyte.org>
  namespaceSelector: {}
  objectSelector:
    matchLabels:
      inject-flyte-secrets: "true"
  reinvocationPolicy: Never
  rules:
  - apiGroups:
    - '*'
    apiVersions:
    - v1
    operations:
    - CREATE
    resources:
    - pods
    scope: '*'
  sideEffects: NoneOnDryRun
  timeoutSeconds: 10
same thing basically
y
weird, i didn’t think they were a namespaced resource at all
m
maybe that's the issue?
y
no i dont’ think so
that’s what i see on my end.
and it’s working, at least for the sandbox
m
Copy code
No plugin found for Handler-type [python-task], defaulting to [container],
can this be anything?
it's in the flytepropeller logs.
y
nah that’s the correct behavior
need a way to look at kubeapi logs
meeting brb
m
ok I'll be looking into it
y
and you have
Copy code
inject-flyte-secrets: "true"
in your task pod labels right?
and what apiversion are your pods?
Copy code
apiVersion: v1
kind: Pod
m
yes, and v1 versions
there are no propeller logs for when the webhook fails right?
y
no…
i mean there are
but like i think that would be pretty obvious.
i don’t think the issue is the webhook failing, i think it’s not being called at all
@jeev have you used secrets at all on gcp?
flyte secrets that is
or webhooks in general
m
yeah sorry, I was thinking about it not being called, if there was any way to check that
j
no we don't use the secret webhook. we just provision the namespaces with the appropriate secrets in place.
m
wait... how does that work?
like, the entire reason we want to use the webhook is to inject sensitive data into our containers (like MLflow credentials). I thought the only way to do that was using the webhook and
secret_requests
. Is there another way?
by the way, all of our Flyte projects share the same namespace,
ml-flyte-projects
.
j
we pre-create the namespace, add secrets to it by any method of your choice, then just tell flyte to mount the secrets in as env vars or file via the pod spec. this can be done as a sidecar job or via pod templates in flytepropeller
just manage the namespace/secrets outside of flyte basically
y
and jeev did you do that cuz secrets weren’t there at the time? or because you had issues getting them to work?
j
this was our preferred design - declarative and easily reproducible.
y
got it, cool
m
@jeev oh, cool! that's perfect! so, just to be clear, the pod template config you're referring to is this one, right? https://docs.flyte.org/en/stable/deployment/cluster_config/flytepropeller_config.html#default-pod-template-name-string you should set it like this?
Copy code
configmap:
  k8s:
    k8s:
      default-pod-template-name: <PodTemplate created in the same namespace as FlytePropeller>
the sidecar job is created with the sidecar plugin, right?
j
right. we haven't used pod templates there, but it was meant to support this exact use case - platform-defined default for pod spec. we’re currently using sidecar tasks to define the pod spec.
we will be migrating to pod templates!
m
ok, perfect! I'll try to use it and I'll let both of you know. thank you very much for your patience! 😄
👍 1
Hey, everyone! Good morning! Quick question about pod templates. Do I have to specify everything for a specific template, or is it like an override for the default Flyte template?
j
not 100% sure on this. @Dan Rammer (hamersaw) can speak to it. though i do know that the
PodTemplate
object still has to be valid, so many of the fields will be required.
m
Ok, I'll look into it. @Yee, if you want to continue debugging what's going on later, I'll be available. Looking at how to use pod templates, it seems that I cannot limit when a secret should be mounted (@jeev can correct me if I'm wrong), so I think the webhook is a more granular option.
j
@Matheus Moreno: correct. this would be for a project-specific default pod spec. this is more of freenome's use case.
👍 1
y
i’m down to do a screenshare if you want. now’ish?
i don’t have any more information, but maybe looking at the gcp console might provide some more clues
d
@Matheus Moreno the default PodTemplate docs issue provides a little more context. Basically, the PodTemplate is used as a base for all k8s Pods to be built on. So we start with the PodTemplate, layer k8s plugin configuration, and then layer the specific task configuration. So you can specify as little as you want.
m
Hi, everyone! Sorry for the late reply. I'm very happy to inform that I was able to inject secrets in a more granular manner using a Pod Task specification. I made a very simple function, similar to the example in the documentation, that creates a
V1SecretVolumeSource
and a
V1VolumeMount
with the required secrets. Since it's a sidecar task, I believe it's something similar to what Jeev is doing right now. For now, it works perfectly! 🙏
y
hey would you mind pasting the labels and annotations for one of the task pods again?
just want to verify that those are being set correctly for the webhook to pick up on.
like in your annotation you see something like
Copy code
flyte.secrets/s0: m4zg54lqhiqce4dfon1c1z2sn41xaiqknnsxsoraej1gk32ufvsw34rcbjww54loorpxezlrovuxezlnmvxhioraivhfmx1wifjau
right?
m
yeah, those are being set up
and there's also an annotation with "inject-flyte-secrets: true"
y
okay… then i suspect something is amiss on the gke side.
sorry 😞 but it’ll be a bit before we can investigate mroe
i think something is happening on the gke side that is preventing these webhooks from running
unless you’re seeing other webhooks go through?
m
Maybe that's it... I don't know if other webhooks are running on our cluster. I only have access to certain namespaces
y
well if you want to pursue this, is this something you can ask around internally with the team that set up the cluster?
m
sure, I can talk to them. A friend of mine is actually the admin of the cluster
I'll let you know what I could find
y
yes please, thanks
251 Views