You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With Docker 27, trying to run a docker container which was saved using an older version of Docker results with an error:
>python -m datalad_container.adapters.docker run container/image sh -c "echo 123"
(...)
RuntimeError: docker image sha256:f881bd4db45ac9775f5a5377485a7c939fea4685d7482eed4809cb83fc3b51a3 was not successfully loaded
Docker loads an image, but its ID does not match what DataLad expects based on the image that was stored:
>docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
remodnav latest 81aaa31870f5 16 months ago 3.8GB
This was observed when trying to reproduce paper-remodnav (versioned link), and snippets in this issue are based on that dataset.
Which software versions are affected?
Unclear. The problem was observed and later confirmed on Windows with Docker version 27.5.1. For me, the problem does not replicate on Debian 12 (bookworm) with Docker version 20.10.4 (docker.io package). @mih reports that it still works on his laptop, with v26.1.5.
As far as saving the image goes, I don't know which Docker version was used; however, I suppose < 25 for reasons explained below.
Where in the code does the problem happen?
The error message comes from the datalad_container.adapters.docker function:
lgr.debug("Image %s is already present", image_id)
ifimage_idnotin_list_images():
raiseRuntimeError(
"docker image {} was not successfully loaded".format(image_id))
returnimage_id
The function performs a relatively simple operation: it creates a tar file object from the contents of the requested directory, and pipes it directly into docker load (all done with streams, without saving intermediate files). It then compares the image ID reported by docker to the one inferred from the image stored in the dataset - this is where the error is raised.
Again, the operation is relatively simple. The function opens the image manifest stored in the dataset, opens the config file it points to, and hashes its content.
Investigating the docker save layout and speculation about IDs
With that dataset, I am able to mimic DataLad's approach in creating the tar file, and save it to a file for further inspection and for loading with docker load -i:
>>> with tarfile.open("img.tar", mode="w|", dereference=True) as tar:
... tar.add("container\\image", arcname="")
Note: I tried writing the tar file on both GNU/Linux and Windows. The files had different checksums (new line characters? tar header?) but both produced the same image ID when loaded on Windows.
With that, I also tried a docker load - docker save round-trip. Docker 27 has no problem loading an image generated from the dataset content in the manner above. When saving, it produces a different layout - one that is OCI compatible in fact. See OCI image format specification and, in particular, the part about Image layout.
The change in save layout was most likely introduced in Docker 25 - the release notes for Docker Engine 25.0.0 include "The docker image save tarball output is now OCI compliant".
This is the layout of a tar file created from the dataset:
Note that the blobs include both 81aaa (which matches the image ID reported by Docker 27) and f881b (which matches the ID that DataLad expected to see, and more than likely also the ID that Docker 20 would report).
Let's explore the new layout then (note: all JSON contents below are presented with jq for readability). First, there is manifest.json:
The manifest references the config with f881b checksum - this is the "old" config, and the one DataLad would look at when determining the expected image ID! However, according to the OCI Image Layout Specification, this manifest is a "file associated with a backwards compatible docker save format", and is not part of the spec.
The mandatory file, acording to the OCI spec, is index.json, and here are its contents:
This manifest points to a config file with f881b digest, ie. exactly the one from the dataset!
It would seem that it is this manifest, rather than the config file, that docker uses as the basis for the dataset ID. However, given that it is checksums (of the config and the layers) all the way down, this seems to be equivalent (with Docker now hashing a "higher-level" metadata file). However, I wasn't able to find an indication of the ID change in Docker's release notes or documentation, so this is a speculation based on comparing the save layouts and reading the OSI spec.
How can we fix this?
This is unclear at the moment.
If I am right about Docker 27's ID being based on a metadata representation which is equivalent but different to the file saved in the dataset, this means that with the old layout we can't know the ID upfront (unless we try to create the manifest ourselves, which seems doable but finicky).
One possible workaround would be to simply drop the ID check which produced an error. We would still rely on an exit code from docker load giving us some assurance that loading succeeded, so it does not sound entirely wrong.
However, the expected ID is being checked (against a list of Docker images being present) twice. The first time, it is done to decide whether the image needs to be loaded in the first place. So not changing that part would mean loading the image every time the function is called, which sounds bad.
The text was updated successfully, but these errors were encountered:
mslw
added a commit
to mslw/datalad-container
that referenced
this issue
Feb 5, 2025
It appears that while Docker 27 has no problem loading images saved with
older versions, it generates the ID based on the "new style"
(OCI-compliant) manifest that it would save starting with v25, and not
the config file stored in the dataset. This causes DataLad to error out
due to ID mismatch, although the ID is most likely equivalent; see datalad#269
This commit is the first attempt to solve this issue. Since the manifest
is a structured file, an attempt is made to generate a "new" style
manifest based on the contents of the saved image, and derive the ID
from that.
The manifest needs file types, sizes, and checksums. While we could copy
checksums from the previous manifest / config, we do not seem to have
the sizes. To solve that problem, we get both through ls_file_collection
from datalad-next. This is convenient and quick, but introduces a new
dependency.
The generated structure and content are a guesswork based on reading the
OCI spec and seeing docker save output from a single container - it sure
works from that container and tries to be applicable more broadly, but
most likely won't cover more complicated cases, or those where I'm not
even sure what behavior to expect (e.g. multi-arch manifest?). Layers
are assumed to always be rootfs_diff (I currently don't know if there
are other types possible).
This commit focuses on reading older images with new Docker, and does
not address reading new images (reading images saved with Docker 26
would still fail, because it already uses the new save format which our
adapter does not expect). So the combinatorics around that will need to
be addressed later.
The new code would only trigger for Docker 27. It introduces one small
regression, where get_image_id raises a NotImplementedError for two
arguments which can be given to the old get_image.
With Docker 27, trying to run a docker container which was saved using an older version of Docker results with an error:
Docker loads an image, but its ID does not match what DataLad expects based on the image that was stored:
This was observed when trying to reproduce paper-remodnav (versioned link), and snippets in this issue are based on that dataset.
Which software versions are affected?
Unclear. The problem was observed and later confirmed on Windows with Docker version 27.5.1. For me, the problem does not replicate on Debian 12 (bookworm) with Docker version 20.10.4 (
docker.io
package). @mih reports that it still works on his laptop, with v26.1.5.As far as saving the image goes, I don't know which Docker version was used; however, I suppose < 25 for reasons explained below.
Where in the code does the problem happen?
The error message comes from the
datalad_container.adapters.docker
function:datalad-container/datalad_container/adapters/docker.py
Lines 110 to 150 in 55309f8
The function performs a relatively simple operation: it creates a tar file object from the contents of the requested directory, and pipes it directly into
docker load
(all done with streams, without saving intermediate files). It then compares the image ID reported by docker to the one inferred from the image stored in the dataset - this is where the error is raised.The expected ID is returned by
get_image
:datalad-container/datalad_container/adapters/docker.py
Lines 88 to 107 in 55309f8
Again, the operation is relatively simple. The function opens the image manifest stored in the dataset, opens the config file it points to, and hashes its content.
Investigating the docker save layout and speculation about IDs
With that dataset, I am able to mimic DataLad's approach in creating the tar file, and save it to a file for further inspection and for loading with
docker load -i
:Note: I tried writing the tar file on both GNU/Linux and Windows. The files had different checksums (new line characters? tar header?) but both produced the same image ID when loaded on Windows.
With that, I also tried a
docker load
-docker save
round-trip. Docker 27 has no problem loading an image generated from the dataset content in the manner above. When saving, it produces a different layout - one that is OCI compatible in fact. See OCI image format specification and, in particular, the part about Image layout.The change in save layout was most likely introduced in Docker 25 - the release notes for Docker Engine 25.0.0 include "The docker image save tarball output is now OCI compliant".
This is the layout of a tar file created from the dataset:
And this is the one created after running
docker load
anddocker save
:Note that the blobs include both
81aaa
(which matches the image ID reported by Docker 27) andf881b
(which matches the ID that DataLad expected to see, and more than likely also the ID that Docker 20 would report).Let's explore the new layout then (note: all JSON contents below are presented with
jq
for readability). First, there ismanifest.json
:The manifest references the config with
f881b
checksum - this is the "old" config, and the one DataLad would look at when determining the expected image ID! However, according to the OCI Image Layout Specification, this manifest is a "file associated with a backwards compatible docker save format", and is not part of the spec.The mandatory file, acording to the OCI spec, is
index.json
, and here are its contents:This index file points to a manifest, with a digest (
81aaa
) matching the ID of the dataset created by Docker 27.Here is the content of that manifest, ie.
blobs/sha256/81aaa...
:This manifest points to a config file with
f881b
digest, ie. exactly the one from the dataset!It would seem that it is this manifest, rather than the config file, that docker uses as the basis for the dataset ID. However, given that it is checksums (of the config and the layers) all the way down, this seems to be equivalent (with Docker now hashing a "higher-level" metadata file). However, I wasn't able to find an indication of the ID change in Docker's release notes or documentation, so this is a speculation based on comparing the save layouts and reading the OSI spec.
How can we fix this?
This is unclear at the moment.
If I am right about Docker 27's ID being based on a metadata representation which is equivalent but different to the file saved in the dataset, this means that with the old layout we can't know the ID upfront (unless we try to create the manifest ourselves, which seems doable but finicky).
One possible workaround would be to simply drop the ID check which produced an error. We would still rely on an exit code from
docker load
giving us some assurance that loading succeeded, so it does not sound entirely wrong.However, the expected ID is being checked (against a list of Docker images being present) twice. The first time, it is done to decide whether the image needs to be loaded in the first place. So not changing that part would mean loading the image every time the function is called, which sounds bad.
The text was updated successfully, but these errors were encountered: