Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bake - Optimize remote cache storage for N images based on same the image. #3019

Open
ggjulio opened this issue Feb 21, 2025 · 1 comment
Open
Labels
kind/enhancement New feature or request status/triage

Comments

@ggjulio
Copy link

ggjulio commented Feb 21, 2025

Description

I'm using bake and buildkit remote cache.

Buildkit cache seem inefficient when building N images that reference the same base image.
Data seem to be pushed as cache N times when N images reference the same image.

I would expect the derived images to take very little space as it is the base image that contains the "heavy" content.

We have ~400GB of cache when it would probably only takes 110/150GB max without any duplication.

Example:

target "base-image" {
    # [...]
    cache-to = [
      "${BAKE_CACHE_OPTS},ref=${BAKE_CACHE_REGISTRY}/base-image:${GIT_BRANCH_SANITIZED}"
    ]
    cache-from = [
      // we use Artifactory, layers from a repo are not really duplicated when reused in another repo.
      "${BAKE_CACHE_OPTS},ref=${BAKE_CACHE_REGISTRY}/base-image:master",
      "${BAKE_CACHE_OPTS},ref=${BAKE_CACHE_REGISTRY}/base-image:${GIT_BRANCH_SANITIZED}"
    ]
}
target "image-one" {
    # [...]
    contexts = {
      base = "target:base-image"
    }
    cache-to = [
      "${BAKE_CACHE_OPTS},ref=${BAKE_CACHE_REGISTRY}/image-one:${GIT_BRANCH_SANITIZED}"
    ]
    cache-from = [
      "${BAKE_CACHE_OPTS},ref=${BAKE_CACHE_REGISTRY}/image-one:master",
      "${BAKE_CACHE_OPTS},ref=${BAKE_CACHE_REGISTRY}/image-one:${GIT_BRANCH_SANITIZED}"
    ]
}
target "image-N" {
    # [...]
    contexts = {
      base = "target:base-image"
    }
    cache-to = [
      "${BAKE_CACHE_OPTS},ref=${BAKE_CACHE_REGISTRY}/image-N:${GIT_BRANCH_SANITIZED}"
    ]
    cache-from = [
      "${BAKE_CACHE_OPTS},ref=${BAKE_CACHE_REGISTRY}/image-N:master",
      "${BAKE_CACHE_OPTS},ref=${BAKE_CACHE_REGISTRY}/image-N:${GIT_BRANCH_SANITIZED}"
    ]
}

Any infos/ideas appreciated.

Let me know if a reproduction repo is needed.

@ggjulio ggjulio added kind/enhancement New feature or request status/triage labels Feb 21, 2025
@ggjulio ggjulio changed the title Bake - Optimizing buildkit cache for images based on same base image. Bake - Optimize remote cache storage for N images based on same the image. Feb 21, 2025
@tonistiigi
Copy link
Member

max without any duplication.

I'm not sure what you mean by duplication in there. The registry API is content-addressable with every blob being accessed by its content checksum so there can't be any byte-by-byte duplication. So you might say that something is logically equal but not exactly equal. Or you expect a cache match but don't see it in build output.

If you have a case where you expect a cache match but are not seeing it then we need reproducible commands in order to debug what is going on.

If you are using bake and build multiple targets together, then note that pushing to a registry reference will overwrite the content that was there before. You do seem to be using separate references in your example though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement New feature or request status/triage
Projects
None yet
Development

No branches or pull requests

2 participants