Tom Stokes
12/12/2022, 1:38 PM$ docker ps
>>>
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
dbf8f5dcb150 <http://cr.flyte.org/flyteorg/flyte-sandbox:dind-bfa1dd4e6057b6fc16272579d61df7b1832b96a7|cr.flyte.org/flyteorg/flyte-sandbox:dind-bfa1dd4e6057b6fc16272579d61df7b1832b96a7> "tini flyte-entrypoi…" About an hour ago Up About an hour 0.0.0.0:30081-30082->30081-30082/tcp, 0.0.0.0:30084->30084/tcp, 2375-2376/tcp, 0.0.0.0:30086-30088->30086-30088/tcp flyte-sandbox
From which we can then find the images that exist inside the dbf8f5dcb150
container:
$ docker exec -it dbf8f5dcb150 docker image ls
>>>
REPOSITORY TAG IMAGE ID CREATED SIZE
papermill-exploration latest 3c40c6deb126 23 minutes ago 948MB
...
I can see my project in there under the tag papermill-exploration:latest
.
I then serialize and submit my workflow spec as follows:
pyflyte --pkgs workflows package -f --image "papermill-exploration:latest"
flytectl register files --project flytesnacks --domain development --archive flyte-package.tgz --version v2
All of which works:
$ flytectl get workflows --project flytesnacks --domain development
>>>
--------- ------------------------------------ -----------------------------
| VERSION | NAME | CREATED AT |
--------- ------------------------------------ -----------------------------
| v2 | workflows.workflow.nb_to_python_wf | 2022-12-12T12:41:53.987960Z |
--------- ------------------------------------ -----------------------------
| v1 | workflows.workflow.nb_to_python_wf | 2022-12-12T12:33:08.295661Z |
--------- ------------------------------------ -----------------------------
2 rows
I then attempt to invoke the workflow, but the resulting pod cannot pull the image:
$ flytectl get execution --project flytesnacks --domain development azlfqvzfsbz4lr8pbmlt
>>>
---------------------- ------------------------------------ ------------- -------- ---------------- -------------------------------- --------------- -------------------- ---------------------------------------------------------
| NAME | LAUNCH PLAN NAME | TYPE | PHASE | SCHEDULED TIME | STARTED | ELAPSED TIME | ABORT DATA (TRUNC) | ERROR DATA (TRUNC) |
---------------------- ------------------------------------ ------------- -------- ---------------- -------------------------------- --------------- -------------------- ---------------------------------------------------------
| azlfqvzfsbz4lr8pbmlt | workflows.workflow.nb_to_python_wf | LAUNCH_PLAN | FAILED | | 2022-12-12T13:07:23.548693519Z | 23.161600293s | | [1/1] currentAttempt done. Last Error: USER::containers |
| | | | | | | | | with unready status: [azlfqvzfsbz4lr8pbmlt-n |
---------------------- ------------------------------------ ------------- -------- ---------------- -------------------------------- --------------- -------------------- ---------------------------------------------------------
1 rows
$ docker exec -it dbf8f5dcb150 kubectl -n flytesnacks-development describe pod azlfqvzfsbz4lr8pbmlt-n0-0
>>>
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 27m default-scheduler Successfully assigned flytesnacks-development/azlfqvzfsbz4lr8pbmlt-n0-0 to dbf8f5dcb150
Normal Pulling 25m (x4 over 27m) kubelet Pulling image "papermill-exploration:latest"
Warning Failed 25m (x4 over 27m) kubelet Failed to pull image "papermill-exploration:latest": rpc error: code = Unknown desc = Error response from daemon: pull access denied for papermill-exploration, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Warning Failed 25m (x4 over 27m) kubelet Error: ErrImagePull
Warning Failed 25m (x6 over 27m) kubelet Error: ImagePullBackOff
Normal BackOff 2m22s (x106 over 27m) kubelet Back-off pulling image "papermill-exploration:latest"
Have I missed something here? Are the pods not authenticated against the docker repo? Or am I not specifying my images correctly?Samhita Alla
flytectl sandbox exec -- docker build . --tag "papermill-exploration:latest"
command?Tom Stokes
12/12/2022, 4:49 PMpapermill-exploration:latest
image inside the container running k3s.
I think that I've found the issue - this StackOverflow message suggests that, when using the --docker
flag with k3s
, the host docker is used rather than containerd
(you can see this argument in the entrypoint script).
In that instance, the image will be found if the imagePullPolicy
is set to IfNotPresent
. However, the default pods in the sandbox are run using the Always
policy.
I've fixed this by configuring a default pod template as per the docs, applying that and then restarting the propellor component, which then uses the correct policy and fixes the problem:
$ docker exec -i $K3S_CONTAINER_ID kubectl -n flyte -n flytesnacks-development get pod ap4xl5hwmwmgnkwm4spz-n0-0 -o yaml | yq '.spec.containers[0].imagePullPolicy'
>>>
IfNotPresent
where the template looks as follows:
apiVersion: v1
kind: PodTemplate
metadata:
name: default-pod-template
template:
spec:
containers:
- name: default
image: "overwrite-me"
imagePullPolicy: IfNotPresent
Samhita Alla
Yee
"latest"
, then it will set the pull policy to Always.Tom Stokes
12/12/2022, 6:04 PMIngo Kemmerzell
01/20/2023, 7:43 PMEvents:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m40s default-scheduler Successfully assigned flytesnacks-development/fca72778d3e6f4c5c8ab-n0-0 to bc1ca7e70b3c
Normal Pulling 3m21s (x4 over 4m40s) kubelet Pulling image "<http://ghcr.io/flyteorg/flytekit:py3.9-1.3.0|ghcr.io/flyteorg/flytekit:py3.9-1.3.0>"
Warning Failed 3m21s (x4 over 4m40s) kubelet Failed to pull image "<http://ghcr.io/flyteorg/flytekit:py3.9-1.3.0|ghcr.io/flyteorg/flytekit:py3.9-1.3.0>": rpc error: code = Unknown desc = failed to pull and unpack image "<http://ghcr.io/flyteorg/flytekit:py3.9-1.3.0|ghcr.io/flyteorg/flytekit:py3.9-1.3.0>": failed to resolve reference "<http://ghcr.io/flyteorg/flytekit:py3.9-1.3.0|ghcr.io/flyteorg/flytekit:py3.9-1.3.0>": failed to do request: Head "<https://ghcr.io/v2/flyteorg/flytekit/manifests/py3.9-1.3.0>": dial tcp: lookup <http://ghcr.io|ghcr.io> on ...:53: no such host
Warning Failed 3m21s (x4 over 4m40s) kubelet Error: ErrImagePull
Warning Failed 2m56s (x6 over 4m39s) kubelet Error: ImagePullBackOff
Normal BackOff 2m43s (x7 over 4m39s) kubelet Back-off pulling image "<http://ghcr.io/flyteorg/flytekit:py3.9-1.3.0|ghcr.io/flyteorg/flytekit:py3.9-1.3.0>"
Samhita Alla
flytectl demo start --env HTTP_PROXY=...
Let me know if this works!Ingo Kemmerzell
01/21/2023, 3:17 PMdocker exec flyte-sandbox env
But now I'm getting a different error message which shows the certificate of ghcr.io is not accepted.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 18m default-scheduler Successfully assigned flytesnacks-development/axpsj4sv5xsj6z2krc57-n0-0 to d3482e31a88c
Normal Pulling 16m (x4 over 18m) kubelet Pulling image "<http://ghcr.io/flyteorg/flytekit:py3.9-1.3.0|ghcr.io/flyteorg/flytekit:py3.9-1.3.0>"
Warning Failed 16m (x4 over 18m) kubelet Failed to pull image "<http://ghcr.io/flyteorg/flytekit:py3.9-1.3.0|ghcr.io/flyteorg/flytekit:py3.9-1.3.0>": rpc error: code = Unknown desc = failed to pull and unpack image "<http://ghcr.io/flyteorg/flytekit:py3.9-1.3.0|ghcr.io/flyteorg/flytekit:py3.9-1.3.0>": failed to resolve reference "<http://ghcr.io/flyteorg/flytekit:py3.9-1.3.0|ghcr.io/flyteorg/flytekit:py3.9-1.3.0>": failed to do request: Head "<https://ghcr.io/v2/flyteorg/flytekit/manifests/py3.9-1.3.0>": x509: certificate signed by unknown authority
Warning Failed 16m (x4 over 18m) kubelet Error: ErrImagePull
Warning Failed 16m (x6 over 17m) kubelet Error: ImagePullBackOff
Normal BackOff 2m59s (x65 over 17m) kubelet Back-off pulling image "<http://ghcr.io/flyteorg/flytekit:py3.9-1.3.0|ghcr.io/flyteorg/flytekit:py3.9-1.3.0>"
Samhita Alla
Ingo Kemmerzell
01/23/2023, 10:20 AMSamhita Alla
Eduardo Apolinario (eapolinario)
01/23/2023, 7:41 PM--env
should be enough for k3s (really containerd) to pick them up.Ingo Kemmerzell
01/24/2023, 6:50 PM