https://flyte.org logo
#flyte-deployment
Title
# flyte-deployment
m

Matheus Moreno

07/15/2022, 6:43 PM
Hey, everyone! I need help with one other thing. So, almost everything works in our (GCP) cluster. The only thing that looks like it isn't working is secret injection. The pod is initialized with annotations
flyte.secrets/sX
, but no secret is actually being found by the task. How can I debug this? A little bit more of context: we are injecting environment variables and service account JSONs in the tasks. Because of that, we are actually retrieving secrets directly from
/etc/flyte/secrets
, because the
SecretsManager()
ends up applying
upper()
and
lower()
to the keys, which messes up the configuration of the env vars (and files). But apparently no secret is being added to this path...
By the way, no environment variables for the secrets (like FLYTE_SECRETS_DEFAULT_DIR) is being set. Maybe it's a problem with the webhook?
Yeah, no luck with the secrets being mounted:
Copy code
flyte@afdxj58wfblmgn52glcb-n1-0:/$ cd /etc/flyte/secrets
bash: cd: /etc/flyte/secrets: No such file or directory
flyte@afdxj58wfblmgn52glcb-n1-0:/$ cd /etc/secrets
bash: cd: /etc/secrets: No such file or directory
Anyone? No luck debugging it 😞
k

katrina

07/15/2022, 8:33 PM
@Yee
y

Yee

07/15/2022, 8:55 PM
your webhook is running right?
m

Matheus Moreno

07/15/2022, 8:55 PM
yes
y

Yee

07/15/2022, 8:58 PM
anything amiss in the logs?
m

Matheus Moreno

07/15/2022, 9:01 PM
message has been deleted
message has been deleted
just this
this is on flyteadmin
and on the console I can see on the task definition that the secrets were requested
y

Yee

07/15/2022, 9:05 PM
sorry i haven’t played with this before. let me do some more digging
m

Matheus Moreno

07/15/2022, 10:36 PM
if you need me to log or run anything please let me know
this seems to be the only thing not working in the cluster right now
y

Yee

07/15/2022, 10:38 PM
yeah sorry still digging
also digging into the ticket you submitted
erm… a while back
Copy code
logger:
  level: 5
  show-source: true
that is the bit to add for logging
were you able to add this logging @Matheus Moreno
i added it to my local sandbox while debugging the other issue you raised.
those tasks are still working for me, but there are also some red herring error messages in the webhook log. at least i’m assuming they must be, since it’s working
Copy code
{"json":{"src":"secrets.go:54"},"level":"info","msg":"Failed to inject a secret using injector [Global]. Error: secrets not found - Env [FLYTE_SECRET_TEST-GROUP_TEST-ENV], file [/etc/secrets/test-group/test-env]","ts":"2022-07-18T20:03:47Z"}
{"json":{"src":"secrets.go:54"},"level":"info","msg":"Failed to inject a secret using injector [Global]. Error: secrets not found - Env [FLYTE_SECRET_TEST-GROUP_TEST-FILE], file [/etc/secrets/test-group/test-file]","ts":"2022-07-18T20:03:47Z"}
that is what i’m seeing locally in sandbox.
but also
Copy code
{
  "o0": "Hello world, these are my secrets: TESTING_ENV / TESTING_FILE"
}
so it just writes an error log for all the failed attempts
m

Matheus Moreno

07/18/2022, 8:17 PM
yeah i'm trying it out right now
Nothing is appearing in the logs. I don't know if the update actually worked
the update = changing the logger level
y

Yee

07/18/2022, 8:26 PM
did you restart the pod?
the webhook pod
which config did you update?
m

Matheus Moreno

07/18/2022, 8:27 PM
I added this to values.yaml
Copy code
configmap:
  logger:
    level: 6
    show-source: true
the flyte-propeller-config is updated on my k8s
y

Yee

07/18/2022, 8:28 PM
yeah so on sandbox, my propeller config looks like this after modifying
Copy code
storage.yaml: |
    logger:
      level: 5
      show-source: true
    storage:
      type: minio
      ...
it doesn’t matter which yaml you put it in.
i just picked storage cuz it was at the bottom of the screen
but the webhook pod will need to be restarted after you update the configmap
m

Matheus Moreno

07/18/2022, 8:31 PM
hang on i think i messed it up
Copy code
/etc/flyte/config $ cat logger.yaml 
level: 6
show-source: true
/etc/flyte/config $
its like this inside the webhook pod
it should be inside a
logger:
field right?
y

Yee

07/18/2022, 8:32 PM
yeah
it’s “top-level” from propeller’s perspective.
which means it’s second level from the perspective of the config map
cuz helm/flyte mounts these individual sections as different files in the container
in any case, to verify, restart propeller too
m

Matheus Moreno

07/18/2022, 8:34 PM
ok I nested it with another
logger:
and did a helm upgrade
y

Yee

07/18/2022, 8:34 PM
propeller has a lot more debug messages, it should be super obvious it worked
m

Matheus Moreno

07/18/2022, 8:34 PM
the new pod is initializing
y

Yee

07/18/2022, 8:34 PM
like it’s so obvious you probably want to turn that back off afterwards
for better or for worse the webhook and propeller both use the same propeller command/image/config today
m

Matheus Moreno

07/18/2022, 8:35 PM
yeah it worked! time to run the workflow
y

Yee

07/18/2022, 8:35 PM
perfect
m

Matheus Moreno

07/18/2022, 8:38 PM
nothing... no warnings, not anything
y

Yee

07/18/2022, 8:38 PM
and you restarted the webhook pod too?
m

Matheus Moreno

07/18/2022, 8:38 PM
if the webhook is able to retrieve the secrets, does it log anything?
yeah
y

Yee

07/18/2022, 8:38 PM
no it doesn’t log
m

Matheus Moreno

07/18/2022, 8:39 PM
these were the last logs, before I even started the task
y

Yee

07/18/2022, 8:39 PM
you’re using k8s secrets right?
m

Matheus Moreno

07/18/2022, 8:39 PM
yes
y

Yee

07/18/2022, 8:40 PM
yeah we need to add more logging
m

Matheus Moreno

07/18/2022, 8:40 PM
if I look at the console, the command asks for the secrets
y

Yee

07/18/2022, 8:41 PM
and the pod spec doesn’t have any secrets?
m

Matheus Moreno

07/18/2022, 8:41 PM
I just noticed that it asks for 2 secrets twice. this is a problem on my end, but it shouldn't cause this bug, right?
y

Yee

07/18/2022, 8:41 PM
sorry what are you testing again? env or file?
or both?
m

Matheus Moreno

07/18/2022, 8:41 PM
yeah
file
y

Yee

07/18/2022, 8:42 PM
so when i was testing that gh issue, something like this shows up in the pod
Copy code
- name: orsxg4bnm4zg54lql3	
    secret:	
      defaultMode: 420	
      items:	
      - key: test-file	
        path: test-file	
      secretName: test-group
m

Matheus Moreno

07/18/2022, 8:42 PM
let me enter the pod real quick to confirm that there's nothing mounted
oh so there is something in the pod spec?
y

Yee

07/18/2022, 8:42 PM
yeah
the webhook alters the podspec before submitting to the pod handler
it’s a mutating webhook
m

Matheus Moreno

07/18/2022, 8:44 PM
there's nothing like that in my pod spec. the only mention to secrets is some notes on the metadata
Copy code
metadata:
  annotations:
    <http://cluster-autoscaler.kubernetes.io/safe-to-evict|cluster-autoscaler.kubernetes.io/safe-to-evict>: "false"
    flyte.secrets/s0: ...
    flyte.secrets/s1: ...
    ...
there are 6 of those
flyte.secrets
and they all look like hashes
y

Yee

07/18/2022, 8:46 PM
is the
FLYTE_SECRETS_DEFAULT_DIR
env var specified?
m

Matheus Moreno

07/18/2022, 8:47 PM
not in the pod spec, no
y

Yee

07/18/2022, 8:47 PM
i see
then it’s not hitting that code
m

Matheus Moreno

07/18/2022, 8:48 PM
but that's weird... shouldn't it be automatically set?
y

Yee

07/18/2022, 8:50 PM
no i guess it only sets it if there are file secrets
if not it won’t set them, which is okay
trying to think of what else to try
m

Matheus Moreno

07/18/2022, 8:55 PM
but what's weird to me is that I request the secrets, they exist, but somehow are not being mounted. maybe the problem is that the
FLYTE_SECRETS_DEFAULT_DIR
isn't set, so no secret is mounted?
or is the variable set after the secrets are mounted?
y

Yee

07/18/2022, 8:56 PM
what do you mean by they exist
like in k8s?
m

Matheus Moreno

07/18/2022, 8:56 PM
yeah, they exist in the cluster
I said that because once I had a problem that a task wouldn't start because k8s couldn't find the secrets
(that was in the sandbox)
y

Yee

07/18/2022, 8:57 PM
there’s no before or after here
it’s just modifying the pod spec…
after all the modifications are done, then it gets submitted to k8s for creation
trying something locally
yeah can you try making your key a
""
you should still be able to register.
and if the webhook is getting called at all, you should trigger this error
i can see that log line locally
and if you still don’t… then the webhook just isn’t getting called at all
m

Matheus Moreno

07/18/2022, 9:07 PM
ok perfect
trying it out, just a sec
y

Yee

07/18/2022, 9:15 PM
actually what do you see when you do
Copy code
$ k get mutatingwebhookconfigurations
NAME                WEBHOOKS   AGE
flyte-pod-webhook   1          2d23h
do you see that?
m

Matheus Moreno

07/18/2022, 9:16 PM
yeah, the pod is running even though no key exists
i actually removed the key in two secrets
y

Yee

07/18/2022, 9:17 PM
what does the log say?
the webhook log
m

Matheus Moreno

07/18/2022, 9:17 PM
Copy code
NAME                                                      WEBHOOKS   AGE
datadog-webhook                                           2          95d
flyte-pod-webhook                                         1          4d1h
i do see the webhook
y

Yee

07/18/2022, 9:18 PM
but no logs?
no error message?
m

Matheus Moreno

07/18/2022, 9:18 PM
nothing in the webhook logs. it hasn't logged anything since i started it
y

Yee

07/18/2022, 9:18 PM
so the webhook logic isn’t being called at all
m

Matheus Moreno

07/18/2022, 9:19 PM
weird question: is the namespace "flyte" hardcoded anywhere on flytepropeller?
y

Yee

07/18/2022, 9:20 PM
yeah in one of the things.
the leader election config i think
m

Matheus Moreno

07/18/2022, 9:20 PM
we deployed the server on the namespace "ml-flyte". I changed some things
the leader election I was able to change
y

Yee

07/18/2022, 9:20 PM
that’s the only one i know of
why do you ask?
can you do -o yaml on the webhook?
i have a ca bundle and this
Copy code
service:
      name: flyte-pod-webhook
      namespace: flyte
      path: /mutate--v1-pod
      port: 443
  failurePolicy: Ignore
  matchPolicy: Equivalent
  name: <http://flyte-pod-webhook.flyte.org|flyte-pod-webhook.flyte.org>
  namespaceSelector: {}
  objectSelector:
    matchLabels:
      inject-flyte-secrets: "true"
  reinvocationPolicy: Never
  rules:
  - apiGroups:
    - '*'
    apiVersions:
    - v1
    operations:
    - CREATE
    resources:
    - pods
    scope: '*'
  sideEffects: NoneOnDryRun
  timeoutSeconds: 10
m

Matheus Moreno

07/18/2022, 9:21 PM
our server is deployed on a "ml-flyte" namespace
y

Yee

07/18/2022, 9:21 PM
what do you see?
m

Matheus Moreno

07/18/2022, 9:21 PM
not "flyte"
y

Yee

07/18/2022, 9:21 PM
that should be fine
m

Matheus Moreno

07/18/2022, 9:22 PM
can you send me the whole -o yaml command? should be a describe pod?
y

Yee

07/18/2022, 9:22 PM
Copy code
kubectl get mutatingwebhookconfigurations flyte-pod-webhook -o yaml
m

Matheus Moreno

07/18/2022, 9:23 PM
Copy code
service:
      name: flyte-pod-webhook
      namespace: ml-dev
      path: /mutate--v1-pod
      port: 443
  failurePolicy: Ignore
  matchPolicy: Equivalent
  name: <http://flyte-pod-webhook.flyte.org|flyte-pod-webhook.flyte.org>
  namespaceSelector: {}
  objectSelector:
    matchLabels:
      inject-flyte-secrets: "true"
  reinvocationPolicy: Never
  rules:
  - apiGroups:
    - '*'
    apiVersions:
    - v1
    operations:
    - CREATE
    resources:
    - pods
    scope: '*'
  sideEffects: NoneOnDryRun
  timeoutSeconds: 10
do you want anything before the caBundle?
y

Yee

07/18/2022, 9:23 PM
ml-dev
?
not
ml-flyte
?
m

Matheus Moreno

07/18/2022, 9:24 PM
ops sorry, that was the one from the dev namespace
hang on
Copy code
service:
      name: flyte-pod-webhook
      namespace: ml-flyte
      path: /mutate--v1-pod
      port: 443
  failurePolicy: Ignore
  matchPolicy: Equivalent
  name: <http://flyte-pod-webhook.flyte.org|flyte-pod-webhook.flyte.org>
  namespaceSelector: {}
  objectSelector:
    matchLabels:
      inject-flyte-secrets: "true"
  reinvocationPolicy: Never
  rules:
  - apiGroups:
    - '*'
    apiVersions:
    - v1
    operations:
    - CREATE
    resources:
    - pods
    scope: '*'
  sideEffects: NoneOnDryRun
  timeoutSeconds: 10
same thing basically
y

Yee

07/18/2022, 9:25 PM
weird, i didn’t think they were a namespaced resource at all
m

Matheus Moreno

07/18/2022, 9:26 PM
maybe that's the issue?
y

Yee

07/18/2022, 9:26 PM
no i dont’ think so
that’s what i see on my end.
and it’s working, at least for the sandbox
m

Matheus Moreno

07/18/2022, 9:30 PM
Copy code
No plugin found for Handler-type [python-task], defaulting to [container],
can this be anything?
it's in the flytepropeller logs.
y

Yee

07/18/2022, 9:32 PM
nah that’s the correct behavior
need a way to look at kubeapi logs
meeting brb
m

Matheus Moreno

07/18/2022, 9:36 PM
ok I'll be looking into it
y

Yee

07/18/2022, 10:13 PM
and you have
Copy code
inject-flyte-secrets: "true"
in your task pod labels right?
and what apiversion are your pods?
Copy code
apiVersion: v1
kind: Pod
m

Matheus Moreno

07/18/2022, 10:14 PM
yes, and v1 versions
there are no propeller logs for when the webhook fails right?
y

Yee

07/18/2022, 10:21 PM
no…
i mean there are
but like i think that would be pretty obvious.
i don’t think the issue is the webhook failing, i think it’s not being called at all
@jeev have you used secrets at all on gcp?
flyte secrets that is
or webhooks in general
m

Matheus Moreno

07/18/2022, 10:26 PM
yeah sorry, I was thinking about it not being called, if there was any way to check that
j

jeev

07/18/2022, 10:29 PM
no we don't use the secret webhook. we just provision the namespaces with the appropriate secrets in place.
m

Matheus Moreno

07/18/2022, 10:37 PM
wait... how does that work?
like, the entire reason we want to use the webhook is to inject sensitive data into our containers (like MLflow credentials). I thought the only way to do that was using the webhook and
secret_requests
. Is there another way?
by the way, all of our Flyte projects share the same namespace,
ml-flyte-projects
.
j

jeev

07/18/2022, 10:54 PM
we pre-create the namespace, add secrets to it by any method of your choice, then just tell flyte to mount the secrets in as env vars or file via the pod spec. this can be done as a sidecar job or via pod templates in flytepropeller
just manage the namespace/secrets outside of flyte basically
y

Yee

07/18/2022, 10:58 PM
and jeev did you do that cuz secrets weren’t there at the time? or because you had issues getting them to work?
j

jeev

07/18/2022, 11:39 PM
this was our preferred design - declarative and easily reproducible.
y

Yee

07/18/2022, 11:52 PM
got it, cool
m

Matheus Moreno

07/19/2022, 12:56 AM
@jeev oh, cool! that's perfect! so, just to be clear, the pod template config you're referring to is this one, right? https://docs.flyte.org/en/stable/deployment/cluster_config/flytepropeller_config.html#default-pod-template-name-string you should set it like this?
Copy code
configmap:
  k8s:
    k8s:
      default-pod-template-name: <PodTemplate created in the same namespace as FlytePropeller>
the sidecar job is created with the sidecar plugin, right?
j

jeev

07/19/2022, 12:59 AM
right. we haven't used pod templates there, but it was meant to support this exact use case - platform-defined default for pod spec. we’re currently using sidecar tasks to define the pod spec.
we will be migrating to pod templates!
m

Matheus Moreno

07/19/2022, 1:00 AM
ok, perfect! I'll try to use it and I'll let both of you know. thank you very much for your patience! 😄
👍 1
Hey, everyone! Good morning! Quick question about pod templates. Do I have to specify everything for a specific template, or is it like an override for the default Flyte template?
j

jeev

07/19/2022, 4:05 PM
not 100% sure on this. @Dan Rammer (hamersaw) can speak to it. though i do know that the
PodTemplate
object still has to be valid, so many of the fields will be required.
m

Matheus Moreno

07/19/2022, 5:28 PM
Ok, I'll look into it. @Yee, if you want to continue debugging what's going on later, I'll be available. Looking at how to use pod templates, it seems that I cannot limit when a secret should be mounted (@jeev can correct me if I'm wrong), so I think the webhook is a more granular option.
j

jeev

07/19/2022, 6:18 PM
@Matheus Moreno: correct. this would be for a project-specific default pod spec. this is more of freenome's use case.
👍 1
y

Yee

07/19/2022, 6:20 PM
i’m down to do a screenshare if you want. now’ish?
i don’t have any more information, but maybe looking at the gcp console might provide some more clues
d

Dan Rammer (hamersaw)

07/19/2022, 6:28 PM
@Matheus Moreno the default PodTemplate docs issue provides a little more context. Basically, the PodTemplate is used as a base for all k8s Pods to be built on. So we start with the PodTemplate, layer k8s plugin configuration, and then layer the specific task configuration. So you can specify as little as you want.
m

Matheus Moreno

07/19/2022, 8:25 PM
Hi, everyone! Sorry for the late reply. I'm very happy to inform that I was able to inject secrets in a more granular manner using a Pod Task specification. I made a very simple function, similar to the example in the documentation, that creates a
V1SecretVolumeSource
and a
V1VolumeMount
with the required secrets. Since it's a sidecar task, I believe it's something similar to what Jeev is doing right now. For now, it works perfectly! 🙏
y

Yee

07/19/2022, 8:49 PM
hey would you mind pasting the labels and annotations for one of the task pods again?
just want to verify that those are being set correctly for the webhook to pick up on.
like in your annotation you see something like
Copy code
flyte.secrets/s0: m4zg54lqhiqce4dfon1c1z2sn41xaiqknnsxsoraej1gk32ufvsw34rcbjww54loorpxezlrovuxezlnmvxhioraivhfmx1wifjau
right?
m

Matheus Moreno

07/19/2022, 8:51 PM
yeah, those are being set up
and there's also an annotation with "inject-flyte-secrets: true"
y

Yee

07/19/2022, 9:02 PM
okay… then i suspect something is amiss on the gke side.
sorry 😞 but it’ll be a bit before we can investigate mroe
i think something is happening on the gke side that is preventing these webhooks from running
unless you’re seeing other webhooks go through?
m

Matheus Moreno

07/19/2022, 9:10 PM
Maybe that's it... I don't know if other webhooks are running on our cluster. I only have access to certain namespaces
y

Yee

07/19/2022, 9:11 PM
well if you want to pursue this, is this something you can ask around internally with the team that set up the cluster?
m

Matheus Moreno

07/19/2022, 9:12 PM
sure, I can talk to them. A friend of mine is actually the admin of the cluster
I'll let you know what I could find
y

Yee

07/19/2022, 9:15 PM
yes please, thanks
25 Views