Hello, I'm trying to run a demo workflow with a cu...
# ask-the-community
z
Hello, I'm trying to run a demo workflow with a custom image in local sandbox, but am having authentication issues puling the image from Azure container registry. I initialized the sandbox with
flytectl sandbox start
I created a secrets file
acr-secrets-flytesnacks-development.yaml
Copy code
apiVersion: v1
data:
  .dockerconfigjson: ***Base64 encoded json***
kind: Secret
metadata:
  name: acr-pull-credentials
  namespace: flyte-development
type: <http://kubernetes.io/dockerconfigjson|kubernetes.io/dockerconfigjson>
I applied the secret in the project-domain namespace with
Copy code
kubectl -n flytesnacks-development apply -f secrets/acr-secrets-flytesnacks-development.yaml
I patched the default service account within the project-domain namespace :
Copy code
kubectl -n flytesnacks-development patch serviceaccount default -p '{"imagePullSecrets": [{"name": "acr-pull-credentials"}]}'
I run my workflow with
Copy code
pyflyte run --remote --image ***.<http://azurecr.io/databricks_workflow:latest|azurecr.io/databricks_workflow:latest> databricks_wf.py databricks_workflow --sql 'select...'
The pods get stuck in pending and describe shows the following issue:
Copy code
Failed to pull image "***.<http://azurecr.io/databricks_workflow:latest|azurecr.io/databricks_workflow:latest>": rpc error: code = Unknown desc = Error response from daemon: Head "https://*****.<http://azurecr.io/v2/databricks_workflow/manifests/latest|azurecr.io/v2/databricks_workflow/manifests/latest>": unauthorized: authentication required, visit <https://aka.ms/acr/authorization> for more information.
Placing the decoded secret in .docker/config.json allows
docker pull
to work locally. Thanks for a any suggestions!
y
can you check the pod?
get the
-o yaml
for the pod and grep for pull secrets?
z
I don't understand what command you mean by check. K8s is still relatively new to me.
y
oh i just mean the pod that is stuck
can you
kubectl get -n flytesnacks-development get pod <pod_name> -o yaml
and see if there’s a mention of imagepullsecrets
can you also use the
-o yaml
switch on the default serviceaccount
it should look like the example
Copy code
apiVersion: v1
kind: ServiceAccount
metadata:
  creationTimestamp: 2015-08-07T22:02:39Z
  name: default
  namespace: default
  uid: 052fb0f4-3d50-11e5-b066-42010af0d7b6
imagePullSecrets:
- name: myregistrykey
just want to confirm that the patch worked, that nothing else reverted it after, etc.
z
So for the service account, I think it looks correct
Copy code
apiVersion: v1
imagePullSecrets:
- name: acr-pull-credentials
kind: ServiceAccount
metadata:
  creationTimestamp: "2022-09-16T14:31:27Z"
  name: default
  namespace: flytesnacks-development
  resourceVersion: "5798"
  uid: dde5ad00-83f2-4f20-8c17-86d92f6a4001
secrets:
- name: default-token-9lvqx
And here's what I got from grepping the output from
get pod
for the stuck pod:
Copy code
imagePullSecrets:
  - name: acr-pull-credentials
  nodeName: 9a8ded8a5390
y
so it’s getting added… and it’s just not able to pull it. interesting.
can you just run the pod for me?
one sec let me write you the yaml
cat this to a file and try to
kubectl create -f file.yaml
Copy code
apiVersion: v1
kind: Pod
metadata:
  name: debugpod
  namespace: flytesnacks-development
spec:
  serviceAccount: default
  serviceAccountName: default
  imagePullSecrets:
    - name: acr-pull-credentials
  containers:
  - args:
    - sleep
    - infinity
    image: <http://blah.azurecr.io/databricks_workflow:latest|blah.azurecr.io/databricks_workflow:latest>
    imagePullPolicy: IfNotPresent
    name: abc
i dunno if i got that right… hope so
z
It appears to have landed in the same status, ImagePullBackOff
Apparently that status counts against resource limits. I had to delete the previous ones to start
debugpod
I'm going to check the credentials again. I think that my docker config might have been updated by the az cli between when I pasted in the credentials and when I ran
docker pull
. Maybe they are actually misformatted.
Nope, confirmed that decoding the data from the secret and placing it in docker config.json works
y
sorry was afk… were you able to run the debugpod?
i’m just trying to isolate out flyte
z
It doesn't run. It gets the same authentication error
y
ah okay
that’s good though.
did you follow this format for the secret?
Copy code
kubectl create secret docker-registry <secret-name> \
    --namespace <namespace> \
    --docker-server=<container-registry-name>.<http://azurecr.io|azurecr.io> \
    --docker-username=<service-principal-ID> \
    --docker-password=<service-principal-password>
z
No, I created the following yaml and then applied to the namespace:
Copy code
apiVersion: v1
data:
  .dockerconfigjson: ***Base64 encoded json***
kind: Secret
metadata:
  name: acr-pull-credentials
  namespace: flyte-development
type: <http://kubernetes.io/dockerconfigjson|kubernetes.io/dockerconfigjson>
y
what what were the keys of your json?
z
Copy code
auths
   <http://address.acr.io|address.acr.io>
       auth
       identitytoken
y
can you try the other way?
z
I'll try. I'm using my own AD right now rather than a SP
y
sorry what’s an AD?
(I’ve never used azure before sorry, only aws and gcp)
z
active directory. It's an MS single sign on/LDAP kind of thing
It complains about the username not being a service principal GUI. I'll see if I can set up a SP and try again
Thanks for your help! At least now I know it's somewhere in the secret itself
y
any azure users here know how to get image pull secrets set up?
s
@Nick Müller (MorpheusXAUT), would you mind helping @Zachary Kimble? I presume that you’re on Azure.
n
We are, unfortunately I'm out of office/on holiday until next week 😕 We were seeing similar issues, I remember we set the pull secrets on the pods directly instead of using service accounts, but I would have to look up the rest next week when I'm back, sorry
s
No problem, thank you!
z
No worries Nick, Thanks for offering to look when you're back!
n
@Zachary Kimble were you able to fix it yourself in the meantime or should I take another look at our setup? 🙂
z
@Nick Müller (MorpheusXAUT) Still stuck. Any more guidance would be appreciated. I haven't tried mounting the secret to the pod's file system. Is that what you mean by "set the pull secrets on the pods directly".
n
sure thing, will get back to you tomorrow 🙂
@Zachary Kimble just checked our configuration. it looks like we're using something pretty similar to what you are doing. we've actually added secret creation to our slightly modified version of the flyte helm chart. it's being created as:
Copy code
{{- with .Values.createImagePullSecrets }}
{{- range $secretName, $secret := . }}
---
apiVersion: v1
kind: Secret
type: <http://kubernetes.io/dockerconfigjson|kubernetes.io/dockerconfigjson>
metadata:
  name: {{ $secretName }}
  {{- if $secret.annotations }}
  annotations: {{- toYaml $secret.annotations | nindent 4 }}
  {{- end }}
  {{- if $secret.labels }}
  labels: {{- toYaml $secret.labels | nindent 4 }}
  {{- end }}
data:
  .dockerconfigjson: {{ template "imagePullSecret" $secret }}
{{- end }}
{{- end }}
where
template "imagePullSecret"
would be:
Copy code
{{- define "imagePullSecret" }}
{{- printf "{\"auths\":{\"%s\":{\"username\":\"%s\",\"password\":\"%s\",\"auth\":\"%s\"}}}" .registry .username .password (printf "%s:%s" .username .password | b64enc) | b64enc }}
{{- end }}
that results in a similar secret than yours being created:
Copy code
apiVersion: v1
kind: Secret
type: <http://kubernetes.io/dockerconfigjson|kubernetes.io/dockerconfigjson>
metadata:
  name: azure-acr-bshrkmain-access
data:
  .dockerconfigjson: <base64>
when decoding the
.dockerconfigjson
value, it'll have the following structure:
Copy code
{
  "auths": {
    "<REGISTRY>": {
      "username": "<USERNAME>",
      "password": "<PASSWORD>",
      "auth": "<BASE64 OF USERNAME:HASH>"
    }
  }
}
The
default
service account in each flyte namespace has said image pull secret attached:
Copy code
apiVersion: v1
kind: ServiceAccount
metadata:
  name: default
  namespace: <NAMESPACE>
secrets:
  - name: <DEFAULT TOKEN>
imagePullSecrets:
  - name: <IMAGE PULL SECRET NAME>
I'm honestly not sure how the image pull secret is assigned the default service accounts in the flyte namespaces, didn't find any template for the cluster resource sync for it 🤔 I guess it might happen automatically/implicitly due to secrets assigned to the other flyte service accounts?
z
Thanks for the detailed configs, @Nick Müller (MorpheusXAUT). The main difference that stands out to me is "username" and "password" in the dockerconfig. I'm providing "identitytoken", so looks like that's my problem. Perhaps either k3s or flyte doesn't support logging into a private registry with an identity token. I'm going to see if I can get a service principal set up and see if that fixes things.
n
hmm, that's a good question, we've always been using username/password for that. might definitely be worth checking out. just out of curiosity: can you manually create pods with secrets from your private registry if you're using the
identitytoken
?
z
Got it working using a service principal username/password. Thanks for the help @Nick Müller (MorpheusXAUT) And no, I wasn't able to successfully start a pod using the
identitytoken
. I get the same authorization message. So not a Flyte issue. I am able to pull the same image using docker inside the local sandbox container. My guess is that
identitytoken
doesn't work with k3s, but not sure.
n
glad to hear you got it sorted out, although it's weird the
identitytoken
wouldn't work...
169 Views