You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Buildkit cache seem inefficient when building N images that reference the same base image.
Data seem to be pushed as cache N times when N images reference the same image.
I would expect the derived images to take very little space as it is the base image that contains the "heavy" content.
We have ~400GB of cache when it would probably only takes 110/150GB max without any duplication.
Example:
target"base-image" {
# [...]cache-to=[
"${BAKE_CACHE_OPTS},ref=${BAKE_CACHE_REGISTRY}/base-image:${GIT_BRANCH_SANITIZED}"
]
cache-from=[
// we use Artifactory, layers from a repo are not really duplicated when reused in another repo."${BAKE_CACHE_OPTS},ref=${BAKE_CACHE_REGISTRY}/base-image:master",
"${BAKE_CACHE_OPTS},ref=${BAKE_CACHE_REGISTRY}/base-image:${GIT_BRANCH_SANITIZED}"
]
}
target"image-one" {
# [...]contexts={
base ="target:base-image"
}
cache-to=[
"${BAKE_CACHE_OPTS},ref=${BAKE_CACHE_REGISTRY}/image-one:${GIT_BRANCH_SANITIZED}"
]
cache-from=[
"${BAKE_CACHE_OPTS},ref=${BAKE_CACHE_REGISTRY}/image-one:master",
"${BAKE_CACHE_OPTS},ref=${BAKE_CACHE_REGISTRY}/image-one:${GIT_BRANCH_SANITIZED}"
]
}
target"image-N" {
# [...]contexts={
base ="target:base-image"
}
cache-to=[
"${BAKE_CACHE_OPTS},ref=${BAKE_CACHE_REGISTRY}/image-N:${GIT_BRANCH_SANITIZED}"
]
cache-from=[
"${BAKE_CACHE_OPTS},ref=${BAKE_CACHE_REGISTRY}/image-N:master",
"${BAKE_CACHE_OPTS},ref=${BAKE_CACHE_REGISTRY}/image-N:${GIT_BRANCH_SANITIZED}"
]
}
Any infos/ideas appreciated.
Let me know if a reproduction repo is needed.
The text was updated successfully, but these errors were encountered:
ggjulio
changed the title
Bake - Optimizing buildkit cache for images based on same base image.
Bake - Optimize remote cache storage for N images based on same the image.
Feb 21, 2025
I'm not sure what you mean by duplication in there. The registry API is content-addressable with every blob being accessed by its content checksum so there can't be any byte-by-byte duplication. So you might say that something is logically equal but not exactly equal. Or you expect a cache match but don't see it in build output.
If you have a case where you expect a cache match but are not seeing it then we need reproducible commands in order to debug what is going on.
If you are using bake and build multiple targets together, then note that pushing to a registry reference will overwrite the content that was there before. You do seem to be using separate references in your example though.
Description
I'm using bake and buildkit remote cache.
Buildkit cache seem inefficient when building N images that reference the same base image.
Data seem to be pushed as cache N times when N images reference the same image.
I would expect the derived images to take very little space as it is the base image that contains the "heavy" content.
We have ~400GB of cache when it would probably only takes 110/150GB max without any duplication.
Example:
Any infos/ideas appreciated.
Let me know if a reproduction repo is needed.
The text was updated successfully, but these errors were encountered: