Using a new Docker version (27) to run an old-ish docker image saved in the dataset causes DataLad error (image ID mismatch) #269

mslw · 2025-02-04T14:31:52Z

With Docker 27, trying to run a docker container which was saved using an older version of Docker results with an error:

>python -m datalad_container.adapters.docker run container/image sh -c "echo 123"
(...)
RuntimeError: docker image sha256:f881bd4db45ac9775f5a5377485a7c939fea4685d7482eed4809cb83fc3b51a3 was not successfully loaded

Docker loads an image, but its ID does not match what DataLad expects based on the image that was stored:

>docker image ls
REPOSITORY   TAG       IMAGE ID       CREATED         SIZE
remodnav     latest    81aaa31870f5   16 months ago   3.8GB

This was observed when trying to reproduce paper-remodnav (versioned link), and snippets in this issue are based on that dataset.

Which software versions are affected?

Unclear. The problem was observed and later confirmed on Windows with Docker version 27.5.1. For me, the problem does not replicate on Debian 12 (bookworm) with Docker version 20.10.4 (docker.io package). @mih reports that it still works on his laptop, with v26.1.5.

As far as saving the image goes, I don't know which Docker version was used; however, I suppose < 25 for reasons explained below.

Where in the code does the problem happen?

The error message comes from the datalad_container.adapters.docker function:

datalad-container/datalad_container/adapters/docker.py

Lines 110 to 150 in 55309f8

    
           def load(path, repo_tag, config): 
        
               """Load the Docker image from `path`. 
        
               Parameters 
        
               ---------- 
        
               path : str 
        
                   A directory with an extracted tar archive. 
        
               repo_tag : str or None 
        
                   `image:tag` of image to load 
        
               config : str or None 
        
                   "Config" value or prefix of image to load 
        
               Returns 
        
               ------- 
        
               The image ID (str) 
        
               """ 
        
               # FIXME: If we load a dataset, it may overwrite the current tag. Say that 
        
               # (1) a dataset has a saved neurodebian:latest from a month ago, (2) a 
        
               # newer neurodebian:latest has been pulled, and (3) the old image have been 
        
               # deleted (e.g., with 'docker image prune --all'). Given all three of these 
        
               # things, loading the image from the dataset will tag the old neurodebian 
        
               # image as the latest. 
        
               image_id = "sha256:" + get_image(path, repo_tag, config) 
        
               if image_id not in _list_images(): 
        
                   lgr.debug("Loading %s", image_id) 
        
                   cmd = ["docker", "load"] 
        
                   p = sp.Popen(cmd, stdin=sp.PIPE, stdout=sp.PIPE, stderr=sp.PIPE) 
        
                   with tarfile.open(fileobj=p.stdin, mode="w|", dereference=True) as tar: 
        
                       tar.add(path, arcname="") 
        
                   out, err = p.communicate() 
        
                   return_code = p.poll() 
        
                   if return_code: 
        
                       lgr.warning("Running %r failed: %s", cmd, err.decode()) 
        
                       raise sp.CalledProcessError(return_code, cmd, output=out) 
        
               else: 
        
                   lgr.debug("Image %s is already present", image_id) 
        
               if image_id not in _list_images(): 
        
                   raise RuntimeError( 
        
                       "docker image {} was not successfully loaded".format(image_id)) 
        
               return image_id

The function performs a relatively simple operation: it creates a tar file object from the contents of the requested directory, and pipes it directly into docker load (all done with streams, without saving intermediate files). It then compares the image ID reported by docker to the one inferred from the image stored in the dataset - this is where the error is raised.

The expected ID is returned by get_image:

datalad-container/datalad_container/adapters/docker.py

Lines 88 to 107 in 55309f8

    
           def get_image(path, repo_tag=None, config=None): 
        
               """Return the image ID of the image extracted at `path`. 
        
               """ 
        
               manifest_path = op.join(path, "manifest.json") 
        
               with open(manifest_path) as fp: 
        
                   manifest = json.load(fp) 
        
               if repo_tag is not None: 
        
                   manifest = [img for img in manifest if repo_tag in (img.get("RepoTags") or [])] 
        
               if config is not None: 
        
                   manifest = [img for img in manifest if img["Config"].startswith(config)] 
        
               if len(manifest) == 0: 
        
                   raise ValueError(f"No matching images found in {manifest_path}") 
        
               elif len(manifest) > 1: 
        
                   raise ValueError( 
        
                       f"Multiple images found in {manifest_path}; disambiguate with" 
        
                       " --repo-tag or --config" 
        
                   ) 
        
               with open(op.join(path, manifest[0]["Config"]), "rb") as stream: 
        
                   return hashlib.sha256(stream.read()).hexdigest()

Again, the operation is relatively simple. The function opens the image manifest stored in the dataset, opens the config file it points to, and hashes its content.

Investigating the docker save layout and speculation about IDs

With that dataset, I am able to mimic DataLad's approach in creating the tar file, and save it to a file for further inspection and for loading with docker load -i:

>>> with tarfile.open("img.tar", mode="w|", dereference=True) as tar:
...     tar.add("container\\image", arcname="")

Note: I tried writing the tar file on both GNU/Linux and Windows. The files had different checksums (new line characters? tar header?) but both produced the same image ID when loaded on Windows.

With that, I also tried a docker load - docker save round-trip. Docker 27 has no problem loading an image generated from the dataset content in the manner above. When saving, it produces a different layout - one that is OCI compatible in fact. See OCI image format specification and, in particular, the part about Image layout.

The change in save layout was most likely introduced in Docker 25 - the release notes for Docker Engine 25.0.0 include "The docker image save tarball output is now OCI compliant".

This is the layout of a tar file created from the dataset:

img_dataset
├── 360338cd2a802f4812f06fbc50237a42bc0303390efa7fa321c381e6ec36d1ae
│   ├── json
│   ├── layer.tar
│   └── VERSION
├── 705094a41713537ec5205e79423114633a7225bae388e7ba823d92126c6b36c0
│   ├── json
│   ├── layer.tar
│   └── VERSION
├── f881bd4db45ac9775f5a5377485a7c939fea4685d7482eed4809cb83fc3b51a3.json
├── manifest.json
└── repositories

And this is the one created after running docker load and docker save:

img_load_save
├── blobs
│   └── sha256
│       ├── 81aaa31870f52a6265bef39d0be0df7f82bab3839344ec8da54cc6c18e3fd7a0
│       ├── d310e774110ab038b30c6a5f7b7f7dd527dbe527854496bd30194b9ee6ea496e
│       ├── e2728fc6d2c404f7b41e0fa4f889117090f4476eefab2bca48d7164dcbf7a0cb
│       └── f881bd4db45ac9775f5a5377485a7c939fea4685d7482eed4809cb83fc3b51a3
├── index.json
├── manifest.json
└── oci-layout

Note that the blobs include both 81aaa (which matches the image ID reported by Docker 27) and f881b (which matches the ID that DataLad expected to see, and more than likely also the ID that Docker 20 would report).

Let's explore the new layout then (note: all JSON contents below are presented with jq for readability). First, there is manifest.json:

[
  {
    "Config": "blobs/sha256/f881bd4db45ac9775f5a5377485a7c939fea4685d7482eed4809cb83fc3b51a3",
    "RepoTags": [
      "remodnav:latest"
    ],
    "Layers": [
      "blobs/sha256/d310e774110ab038b30c6a5f7b7f7dd527dbe527854496bd30194b9ee6ea496e",
      "blobs/sha256/e2728fc6d2c404f7b41e0fa4f889117090f4476eefab2bca48d7164dcbf7a0cb"
    ]
  }
]

The manifest references the config with f881b checksum - this is the "old" config, and the one DataLad would look at when determining the expected image ID! However, according to the OCI Image Layout Specification, this manifest is a "file associated with a backwards compatible docker save format", and is not part of the spec.

The mandatory file, acording to the OCI spec, is index.json, and here are its contents:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.index.v1+json",
  "manifests": [
    {
      "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
      "digest": "sha256:81aaa31870f52a6265bef39d0be0df7f82bab3839344ec8da54cc6c18e3fd7a0",
      "size": 586,
      "annotations": {
        "io.containerd.image.name": "docker.io/library/remodnav:latest",
        "org.opencontainers.image.ref.name": "latest"
      }
    }
  ]
}

This index file points to a manifest, with a digest (81aaa) matching the ID of the dataset created by Docker 27.

Here is the content of that manifest, ie. blobs/sha256/81aaa...:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
  "config": {
    "mediaType": "application/vnd.docker.container.image.v1+json",
    "digest": "sha256:f881bd4db45ac9775f5a5377485a7c939fea4685d7482eed4809cb83fc3b51a3",
    "size": 3157
  },
  "layers": [
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar",
      "digest": "sha256:d310e774110ab038b30c6a5f7b7f7dd527dbe527854496bd30194b9ee6ea496e",
      "size": 77814784
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar",
      "digest": "sha256:e2728fc6d2c404f7b41e0fa4f889117090f4476eefab2bca48d7164dcbf7a0cb",
      "size": 1750877184
    }
  ]
}

This manifest points to a config file with f881b digest, ie. exactly the one from the dataset!

It would seem that it is this manifest, rather than the config file, that docker uses as the basis for the dataset ID. However, given that it is checksums (of the config and the layers) all the way down, this seems to be equivalent (with Docker now hashing a "higher-level" metadata file). However, I wasn't able to find an indication of the ID change in Docker's release notes or documentation, so this is a speculation based on comparing the save layouts and reading the OSI spec.

How can we fix this?

This is unclear at the moment.

If I am right about Docker 27's ID being based on a metadata representation which is equivalent but different to the file saved in the dataset, this means that with the old layout we can't know the ID upfront (unless we try to create the manifest ourselves, which seems doable but finicky).

One possible workaround would be to simply drop the ID check which produced an error. We would still rely on an exit code from docker load giving us some assurance that loading succeeded, so it does not sound entirely wrong.

However, the expected ID is being checked (against a list of Docker images being present) twice. The first time, it is done to decide whether the image needs to be loaded in the first place. So not changing that part would mean loading the image every time the function is called, which sounds bad.

The text was updated successfully, but these errors were encountered:

It appears that while Docker 27 has no problem loading images saved with older versions, it generates the ID based on the "new style" (OCI-compliant) manifest that it would save starting with v25, and not the config file stored in the dataset. This causes DataLad to error out due to ID mismatch, although the ID is most likely equivalent; see datalad#269 This commit is the first attempt to solve this issue. Since the manifest is a structured file, an attempt is made to generate a "new" style manifest based on the contents of the saved image, and derive the ID from that. The manifest needs file types, sizes, and checksums. While we could copy checksums from the previous manifest / config, we do not seem to have the sizes. To solve that problem, we get both through ls_file_collection from datalad-next. This is convenient and quick, but introduces a new dependency. The generated structure and content are a guesswork based on reading the OCI spec and seeing docker save output from a single container - it sure works from that container and tries to be applicable more broadly, but most likely won't cover more complicated cases, or those where I'm not even sure what behavior to expect (e.g. multi-arch manifest?). Layers are assumed to always be rootfs_diff (I currently don't know if there are other types possible). This commit focuses on reading older images with new Docker, and does not address reading new images (reading images saved with Docker 26 would still fail, because it already uses the new save format which our adapter does not expect). So the combinatorics around that will need to be addressed later. The new code would only trigger for Docker 27. It introduces one small regression, where get_image_id raises a NotImplementedError for two arguments which can be given to the old get_image.

This was referenced Feb 5, 2025

Partial fix for predicting Docker 27 image IDs from older images #270

Draft

Unable to rerun containerized workflow with Docker 27 psychoinformatics-de/paper-remodnav#27

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using a new Docker version (27) to run an old-ish docker image saved in the dataset causes DataLad error (image ID mismatch) #269

Using a new Docker version (27) to run an old-ish docker image saved in the dataset causes DataLad error (image ID mismatch) #269

mslw commented Feb 4, 2025

Using a new Docker version (27) to run an old-ish docker image saved in the dataset causes DataLad error (image ID mismatch) #269

Using a new Docker version (27) to run an old-ish docker image saved in the dataset causes DataLad error (image ID mismatch) #269

Comments

mslw commented Feb 4, 2025

Which software versions are affected?

Where in the code does the problem happen?

Investigating the docker save layout and speculation about IDs

How can we fix this?