Hi all I've managed to get a local demo cluster wo...
# ask-the-community
s
Hi all I've managed to get a local demo cluster working on my old laptop with workflows running etc, but I've followed the exact same steps on my new one and I'm getting the following error whenever I run a workflow. Note, I've tried the cookbook hello_world example and the greeting workflow, but both give the same error. I think I'm doing something stupid, but I just can't figure it out.
containers with unready status: [f0de017c01c7140d8a4c-n0-0]|Back-off pulling image "<http://cr.flyte.org/flyteorg/flytekit:py3.9-1.5.0|cr.flyte.org/flyteorg/flytekit:py3.9-1.5.0>"
Any ideas?
k
Interesting, can you try docker pull cr.flyte.org/flyteorg/flytekit:py3.9-1.5.0
It could be network or, missing image?
As you see silently flytekit was upgraded
Cc @Samhita Alla can you please wuickly try
s
Works for me.
k
@Samuel Bentley seems like a proxy or something else
s
So docker pull works fine, but in the container logs I see this
2023-04-14 14:27:17 E0414 13:27:17.174653      73 kuberuntime_image.go:51] "Failed to pull image" err="rpc error: code = Unknown desc = failed to pull and unpack image \"<http://ghcr.io/flyteorg/flytekit:py3.9-latest|ghcr.io/flyteorg/flytekit:py3.9-latest>\": failed to resolve reference \"<http://ghcr.io/flyteorg/flytekit:py3.9-latest|ghcr.io/flyteorg/flytekit:py3.9-latest>\": failed to do request: Head \"<https://ghcr.io/v2/flyteorg/flytekit/manifests/py3.9-latest>\": x509: certificate signed by unknown authority" image="<http://ghcr.io/flyteorg/flytekit:py3.9-latest|ghcr.io/flyteorg/flytekit:py3.9-latest>"
Note, this error is in the Flyte Sandbox container that gives me the initial error ^^^
j
i cant repro in a fresh sandbox. @Samuel Bentley: this probably sounds dumb, but can you restart your docker daemon and fire up a new sandbox? i definitely had weird issues in older docker desktop for mac versions, but havent in awhile.
alternatively, you can also run running a basic pod like:
Copy code
kubectl run -it --rm --image=debian debian sh
see if that results in a similar issue
s
Restarting didn't work, but kubectl did. I'm going to raise a ticket with Docker. Just so I can give them all the details. Is the sandbox puling the image a Docker-in-Docker, or is it pulling from within Kubernetes?
j
it’s running k3s on containerd within the sandbox container.
k
Cc @Chirayu Gupta looks like a similar problem
s
Thanks, I've raised a ticket with Docker, let's see what they say
Docker support pointed me to a K3s issue, that seemed to fix things for other people experiencing the issue with K3s. Is there anything I can do to my env to get this fixed? https://github.com/k3s-io/k3s/issues/1148
j
it’s not a private repo though.
s
Docker gave up on helping me. I'm going to raise a bug with K3s. Can you guys help me with this question please (in the context of the sandbox)? Cluster Configuration: <!-- Provide some basic information on the cluster configuration. For example, "3 servers, 2 agents". -->
j
@Samuel Bentley i’d love to spend some time digging into this with you. Will DM.
we have root caused this issue. the problem was that zscaler was running on the machine and intercepting requests. we had to add the org-specific zscaler root ca to the sandbox to establish trust with
<http://ghcr.io|ghcr.io>
. ill open a PR to enable users to inject additional trusted certs into the sandbox later.
k
@Samuel Bentley this is what I was saying like a proxy / firewall on your machine
j
@Samuel Bentley: would you be open to testing this just to be sure?
s
Yeah you were right. I wonder if it’s down to the Mac chip docker version. Everything was the same on my old Mac (intel chip) and it worked fine.
Sure, I’ll try it out in the morning
j
cool. if its not merged by then, I'll trigger a separate build that you can test with.
try placing the root CA pem in
~/.flyte/sandbox/ca-certificates/
and run:
Copy code
flytectl demo start --image=<http://ghcr.io/flyteorg/flyte-sandbox-bundled:sha-dd75b80cda29a0be441806c3372c6cd46a35dbd9|ghcr.io/flyteorg/flyte-sandbox-bundled:sha-dd75b80cda29a0be441806c3372c6cd46a35dbd9>
s
Yep, that's working! It did take a long time (3 mins to do the greeting wf) though. Don't know if it's related to the change or not
j
Ok good to know it’s working. The image is running a nightly version of Flyte.
s
Thanks for your help again 🙂
j
It might be because of an initial image pull. Is it faster if you run it again?
s
Yep, just took 14secs instead of 3 mins
j
Ok. Was likely just pulling down the image.
k
Soon it will be lower after the next release