Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(docker): reduce size between docker builds #7571

Merged
merged 6 commits into from
Mar 5, 2025

Conversation

keturn
Copy link
Contributor

@keturn keturn commented Jan 18, 2025

by adding a layer with all the pytorch dependencies that don't change most of the time.

Summary

Every time the main docker images rebuild and I pull main-cuda, it gets another 3+ GB, which seems like about a zillion times too much since most things don't change from one commit on main to the next.

This is an attempt to follow the guidance in Using uv in Docker: Intermediate Layers so there's one layer that installs all the dependencies—including PyTorch with its bundled nvidia libraries—before the project's own frequently-changing files are copied in to the image.

Related Issues / Discussions

QA Instructions

Hopefully the CI system building the docker images is sufficient.

But there is one change to pyproject.toml related to xformers, so it'd be worth checking that python -m xformers.info still says it has triton on the platforms that expect it.

Merge Plan

I don't expect this to be a disruptive merge.

(An earlier revision of this PR moved the venv, but I've reverted that change at ebr's recommendation.)

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)

by adding a layer with all the pytorch dependencies that don't change most of the time.
@github-actions github-actions bot added docker Root python-deps PRs that change python dependencies labels Jan 18, 2025
@keturn
Copy link
Contributor Author

keturn commented Jan 18, 2025

This Dockerfile is also quirky in that it separates builder and runtime stages, but then it puts all the build-deps in the runtime stage anyway (to build patchmatch?), which kinda defeats the purpose. But I think we can leave that alone for now as an independent concern.

There is one thing I haven't confirmed for the space savings: my test builds have been with podman (buildah), not buildkit. buildah doesn't support COPY --link, so the interaction between the stages and layers isn't exactly the same…

@ebr
Copy link
Member

ebr commented Feb 15, 2025

hey @keturn , thanks for this! we tried some time ago with pip, but it caused more headaches than was worth the trouble. Now with uv this seems like a very sound approach.
This will work well for someone building the images locally. But just to set expectations: we're not currently doing any caching of Docker layers in GHA runners (we might revisit that at some point), so the intermediate layers will be rebuilt anyway. So IF the expectation is to not have to pull the 2.5GB pytorch layer every time - that is unfortunately still going to continue happening, at least for now. But again, if you're building the image locally, it should help quite a bit on rebuilds.
That said, the GHA docker builds are failing right now, so once I fix that perhaps we can rebase this PR and I'll come back to re-reviewing it. Will keep you posted.

@ebr
Copy link
Member

ebr commented Feb 15, 2025

This Dockerfile is also quirky in that it separates builder and runtime stages, but then it puts all the build-deps in the runtime stage anyway (to build patchmatch?), which kinda defeats the purpose. But I think we can leave that alone for now as an independent concern.

Indeed, this is needed to build patchmatch. Hope at some point we no longer have to do this, but for now it's ok. There's still a small benefit to using the runner image because we just don't need to worry about any other cruft that may be left in the builder image, so i'd like to keep it this way for the time being.

keturn and others added 3 commits February 16, 2025 10:34
including just invokeai/version seems sufficient to appease uv sync here. including everything else would invalidate the cache we're trying to establish.
@keturn
Copy link
Contributor Author

keturn commented Feb 16, 2025

But just to set expectations: we're not currently doing any caching of Docker layers in GHA runners (we might revisit that at some point), so the intermediate layers will be rebuilt anyway. So IF the expectation is to not have to pull the 2.5GB pytorch layer every time - that is unfortunately still going to continue happening, at least for now.

This got me to re-visit the docs on docker cache invalidation. From what I gather, the upshot is

  • no build-cache means those layers will be re-built every time, and
  • bit-for-bit reproducible docker builds are still an esoteric subject, so
  • a re-built layer gets published as a new thing, even if its contents are semantically the same?

well, that's a bit disappointing, but I guess this PR is still a step in the right direction if we are going all-in with uv.

@ebr
Copy link
Member

ebr commented Feb 28, 2025

OK, looks good to me. We'll merge it and I will work on some ideas around re-introducing caching.

@ebr ebr enabled auto-merge (rebase) March 3, 2025 14:32
auto-merge was automatically disabled March 3, 2025 14:41

Rebase failed

@ebr ebr merged commit 4de6fd3 into invoke-ai:main Mar 5, 2025
15 checks passed
@keturn keturn deleted the build/docker-dependency-layer branch March 5, 2025 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docker python-deps PRs that change python dependencies Root
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants