chore(docker): reduce size between docker builds #7571

keturn · 2025-01-18T17:34:05Z

by adding a layer with all the pytorch dependencies that don't change most of the time.

Summary

Every time the main docker images rebuild and I pull main-cuda, it gets another 3+ GB, which seems like about a zillion times too much since most things don't change from one commit on main to the next.

This is an attempt to follow the guidance in Using uv in Docker: Intermediate Layers so there's one layer that installs all the dependencies—including PyTorch with its bundled nvidia libraries—before the project's own frequently-changing files are copied in to the image.

Related Issues / Discussions

QA Instructions

Hopefully the CI system building the docker images is sufficient.

But there is one change to pyproject.toml related to xformers, so it'd be worth checking that python -m xformers.info still says it has triton on the platforms that expect it.

Merge Plan

I don't expect this to be a disruptive merge.

(An earlier revision of this PR moved the venv, but I've reverted that change at ebr's recommendation.)

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
Documentation added / updated (if applicable)
Updated What's New copy (if doing a release after this PR)

by adding a layer with all the pytorch dependencies that don't change most of the time.

pyproject.toml

keturn · 2025-01-18T17:58:47Z

This Dockerfile is also quirky in that it separates builder and runtime stages, but then it puts all the build-deps in the runtime stage anyway (to build patchmatch?), which kinda defeats the purpose. But I think we can leave that alone for now as an independent concern.

There is one thing I haven't confirmed for the space savings: my test builds have been with podman (buildah), not buildkit. buildah doesn't support COPY --link, so the interaction between the stages and layers isn't exactly the same…

ebr · 2025-02-15T21:23:01Z

hey @keturn , thanks for this! we tried some time ago with pip, but it caused more headaches than was worth the trouble. Now with uv this seems like a very sound approach.
This will work well for someone building the images locally. But just to set expectations: we're not currently doing any caching of Docker layers in GHA runners (we might revisit that at some point), so the intermediate layers will be rebuilt anyway. So IF the expectation is to not have to pull the 2.5GB pytorch layer every time - that is unfortunately still going to continue happening, at least for now. But again, if you're building the image locally, it should help quite a bit on rebuilds.
That said, the GHA docker builds are failing right now, so once I fix that perhaps we can rebase this PR and I'll come back to re-reviewing it. Will keep you posted.

ebr · 2025-02-15T21:25:10Z

This Dockerfile is also quirky in that it separates builder and runtime stages, but then it puts all the build-deps in the runtime stage anyway (to build patchmatch?), which kinda defeats the purpose. But I think we can leave that alone for now as an independent concern.

Indeed, this is needed to build patchmatch. Hope at some point we no longer have to do this, but for now it's ok. There's still a small benefit to using the runner image because we just don't need to worry about any other cruft that may be left in the builder image, so i'd like to keep it this way for the time being.

docker/Dockerfile

including just invokeai/version seems sufficient to appease uv sync here. including everything else would invalidate the cache we're trying to establish.

keturn · 2025-02-16T20:52:05Z

But just to set expectations: we're not currently doing any caching of Docker layers in GHA runners (we might revisit that at some point), so the intermediate layers will be rebuilt anyway. So IF the expectation is to not have to pull the 2.5GB pytorch layer every time - that is unfortunately still going to continue happening, at least for now.

This got me to re-visit the docs on docker cache invalidation. From what I gather, the upshot is

no build-cache means those layers will be re-built every time, and
bit-for-bit reproducible docker builds are still an esoteric subject, so
a re-built layer gets published as a new thing, even if its contents are semantically the same?

well, that's a bit disappointing, but I guess this PR is still a step in the right direction if we are going all-in with uv.

ebr · 2025-02-28T21:44:34Z

OK, looks good to me. We'll merge it and I will work on some ideas around re-introducing caching.

chore(docker): reduce size between docker builds

3848e19

by adding a layer with all the pytorch dependencies that don't change most of the time.

keturn requested review from lstein, blessedcoolant, hipsterusername and ebr as code owners January 18, 2025 17:34

github-actions bot added docker Root python-deps PRs that change python dependencies labels Jan 18, 2025

keturn commented Jan 18, 2025

View reviewed changes

pyproject.toml Show resolved Hide resolved

ebr reviewed Feb 15, 2025

View reviewed changes

docker/Dockerfile Outdated Show resolved Hide resolved

keturn and others added 3 commits February 16, 2025 10:34

Merge branch 'main' into build/docker-dependency-layer

275d891

chore(docker): revert to keeping venv in /opt/venv

2236235

chore(docker): include fewer files while installing dependencies

80d38c0

including just invokeai/version seems sufficient to appease uv sync here. including everything else would invalidate the cache we're trying to establish.

hipsterusername approved these changes Feb 28, 2025

View reviewed changes

Merge branch 'main' into build/docker-dependency-layer

0362bd5

ebr enabled auto-merge (rebase) March 3, 2025 14:32

auto-merge was automatically disabled March 3, 2025 14:41
Rebase failed

Merge branch 'main' into build/docker-dependency-layer

3feb1a6

ebr merged commit 4de6fd3 into invoke-ai:main Mar 5, 2025
15 checks passed

keturn deleted the build/docker-dependency-layer branch March 5, 2025 21:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(docker): reduce size between docker builds #7571

chore(docker): reduce size between docker builds #7571

keturn commented Jan 18, 2025 •

edited

Loading

keturn commented Jan 18, 2025

ebr commented Feb 15, 2025

ebr commented Feb 15, 2025

keturn commented Feb 16, 2025

ebr commented Feb 28, 2025

chore(docker): reduce size between docker builds #7571

chore(docker): reduce size between docker builds #7571

Conversation

keturn commented Jan 18, 2025 • edited Loading

Summary

Related Issues / Discussions

QA Instructions

Merge Plan

Checklist

keturn commented Jan 18, 2025

ebr commented Feb 15, 2025

ebr commented Feb 15, 2025

keturn commented Feb 16, 2025

ebr commented Feb 28, 2025

keturn commented Jan 18, 2025 •

edited

Loading