<@U06RKAMK0UC> Recently, every time I'm registerin...
# ask-ai
a
@RunLLM Recently, every time I'm registering a workflow, it checks for the docker images for the different ImageSpecs, and without any dependency changes, it keeps rebuilding the images, or at least some of them. What are some reasons this could be happening?
r
This issue could be due to the way Flyte handles image caching and versioning. If you are using
ImageSpec
with
rebuild_always=True
, Flyte will rebuild the image every time regardless of changes. Additionally, if the task versioning does not consider changes in the container image, it might trigger unnecessary rebuilds. Would you like more details on how to manage image caching and versioning in Flyte? Sources: - Flyte Community Discussion - Flyte Community Discussion - Flyte Package and Register Documentation πŸ‘ 0 button πŸ‘Ž 0 button πŸ’‘ Hint: Mention @runllm in the thread for followups.
k
are you running non-fast register?
a
i guess probably, i'm just running
pyflyte register /path/to/pipeline.py
with project and domain
k
no, it’s fast register
a
oh yeah, ok i'm seeing that in the docs now. yeah it was working for awhile totally fine, but now recently it keeps saying it can't find the image in my registry, so it rebuilds.. it seems like its at least one specific image it does this with
k
mind sharing the imageSpec definition?
if you upgrade flytekit, it will rebuild
a
yeah, i don't think i've changed anything about the image spec And those are all constants that point to strings with "python-dotenv==1.0.1", etc. but none of those values are changing between registrations
Copy code
DEFAULTS = [
    DOTENV,
    GCS,
    BIGQUERY,
    SECRETS,
    DB_DTYPES,
    PENDULUM,
    PSUTIL,
    TQDM,
    YAML,
    TYPEGUARD,
    ML_ULTRA_CLIENT,
    OPENCV_HEADLESS,
    POLARS,
]

default_image_spec = ImageSpec(
    name="default",
    base_image="<http://ghcr.io/flyteorg/flytekit:py3.10-1.10.2|ghcr.io/flyteorg/flytekit:py3.10-1.10.2>",
    packages=[*DEFAULTS],
    apt_packages=["git"],
    registry=DAI_ML_PIPELINES_REGISTRY,
)
k
so flytekit generate different image tag when you rebuild the same ImageSpec?
When you rebuild an image, are the image urls the same?
Copy code
(flytekit-3.10) ➜  flytekit git:(master) βœ— pyflyte run --remote flyte-example/improve_image_spec.py wf                         
Running Execution on Remote.
[2024-05-22T23:01:23.821+0800] {authenticator.py:249} INFO - Retrieved new token, expires in 86400
Image pingsutw/flytekit:DPW01P0tkuBYhsCIdr9fBA found. Skip building.
a
Looks like the urls are the same, but the tag seems to change sometimes. I'm trying to look back through my terminal history. Looks like last time I ran it the tag was the same and it found it, then this time it was different and started building
I can do some more testing, and try to build a couple times in a row with no changes and what not. Its just a bit frustrating because it takes a long time to build
k
would you like to try this fast builder https://github.com/thomasjpfan/imagespec-fast-builder
we plan to use this as default builder in the flytekit
a
oh yeah, I can give that a try. does it have a minimum flytekit version to work?
k
which version of flytekit are you using now
a
1.10.2
to be honest, I still need to figure out how to update our flyte deployment version, and then I was going to update this one
k
I think you need to use at least 1.11 or 1.12
you can upgrade flytekit without upgrading backend server
a
oh ok, that's good to know. i'll try that out soon, then and see. thanks for the help!
ok yeah, so I just finished building one, then I immediately registered again, the tag was different and it started rebuilding.. however when I quit out of that one and restarted it did have the same tag as the one I just quit. seems pretty weird
k
is that because you are using
*DEFAULTS
?
Copy code
packages=[*DEFAULTS]
instead of
Copy code
packages=DEFAULTS
a
ah, it could be something weird with that. I used to have more than defaults in there, so it's like that, but yeah, i'll clean that up
quick question with that builder. i'm getting this error, is there something I have to add to the docker group, or something like that?
Could not open lock file /var/lib/apt/lists/lock - open (13: Permission denied)
k
cc @Thomas Fan ^^^
t
I've seen that error with normal docker + buildkit with caching on. Usually it happens if a previous build did not exit cleaning so it the previous lock file is still in the cache. @Kevin Su Is there a way to reset the buildkit cache in envd?
k
he is using fast builder
a
that would make sense, I quit out of the last build
k
maybe run
docker stop flyte-sandbox
to remove the envd buildkit daemon
or restart the docker engine
t
With my image builder, can you try
docker builder prune
to remove the build cache?
a
restarting didn't work, trying with the builder prune
still getting that error. when I rerun it back to back, it makes it to different parts of the build process (e.g. 3/6, 4/6, 6/6) before breaking, seemingly random which step it makes it to
t
I feel like that is a another issue, but related. I updated my fast image builder to lock the cache so it is not shared. Can you update and try again?
Copy code
pip install imagespec-fast-builder==0.0.17
a
ok, looks like its still failing with the same errors:
Copy code
0.327 Reading package lists...
0.919 E: Could not open lock file /var/lib/apt/lists/lock - open (13: Permission denied)
0.919 E: Unable to lock directory /var/lib/apt/lists/

ERROR: failed to solve: process "/bin/sh -c apt-get update && apt-get install -y --no-install-recommends     ca-certificates" did not complete successfully: exit code: 100
k
Thomas, should we use sudo? https://askubuntu.com/a/163010/893537
t
I'm trying really hard not to use sudo. I think there is one more issue underlying issue. @Andrew can you share the whole output?
a
maybe i need to add something to the docker group on my system or something? not sure. yeah, let me grab that
t
In your image spec, can you remove the base image and add
flytekit
into
DEFAULT
? (It'll use
debian:bookworm-slim
, which ends up to be a smaller image during runtime)
Copy code
default_image_spec = ImageSpec(
    name="default",
    packages=DEFAULTS,
    apt_packages=["git"],
    registry=DAI_ML_PIPELINES_REGISTRY,
)
I also released one more fix:
pip install imagespec-fast-builder==0.0.19
.
a
I believe that's working. I unfortunately ran out of space in docker so I'm trying again, but it definitely made it further. I'll see how this one goes
Ok, that succesfully built the first image! and it definitely seemed faster, that's really nice. not sure if it fixed the image tag changing issue, but i'll see if that comes up
t
Can you share the packages you put in
DEFAULTS
? I'm doing some bigger improvements to my image builder and I want to make sure it works for you.
a
For sure
['python-dotenv==1.0.1', 'google-cloud-storage==2.15.0', 'google-cloud-bigquery==3.13.0', 'google-cloud-secret-manager==2.17.0', 'db-dtypes==1.1.1', 'pendulum==2.1.2', 'psutil==5.9.5', 'tqdm==4.65.0', 'pyyaml==5.3.1', 'typeguard==4.1.5', 'opencv-python-headless==4.7.0.72', 'polars==0.18.13']
I'm also still building some things. I got that failure again, but it was a different image, so I'm making sure I have them all setup right for the fast builder