add deps-tf1 and docker-cuda-tf1 #1186

bertsky · 2024-02-09T12:27:10Z

this would allow specifying FROM ocrd/core-cuda-tf1 for all modules depending on Tensorflow 1 – so this (huge!) Docker layer can be shared

same could be worked out for TF2 and Pytorch.

kba

Makes sense and LGTM. I'm testing to verify and will merge soon, so we can test the deployment.

bertsky · 2024-02-12T08:31:29Z

Unfortunately, in contrast to when I first came up with the deps-cuda solution, the sweet spot where we could easily combine CUDA dependencies for both Torch and TF seems to have vanished now, resulting in a much larger image.

It looks like an older version of nvidia-tensorflow might work better, but I'll have to do further analysis.

bertsky · 2024-02-14T09:04:01Z

Update: we definitely need to hold at nvidia-tensorflow==2.15.5+nv22.11, because all later versions depend on CUDA>11.8, but 11.8 is the last version which is supported by Tensorflow 2.x on Python 3.8, so either our ocrd/core-cuda must use that, or we have to start bifurcating even that (i.e. ocrd/core-cuda11 vs ocrd/core-cuda12).

Unfortunately, the breaking changes with recent Numpy pose an additional difficulty: obviously, the older releases of TF etc are incompatible and so Numpy must now be held at <1.24 (which more recent releases of nvidia-tensorflow also ensure, but we must do post-hoc).

The whole situation will eventually get easier, with TF starting to require its CUDA dependencies explicitly, and not requiring Conda anymore, but instead allowing pip install tensorflow[and-cuda] – but again, that's unfortunately only available from TF 2.14 onwards, which is not supported for Py38.

bertsky · 2024-02-14T22:15:00Z

I think I did find a workable compromise again. Let's see what the CD says.

Looking at this horde of CI jobs: shouldn't we rule out these tests if the only changes affect the dockerfiles (or docker recipes)?

bertsky · 2024-05-25T08:55:21Z

I think we really need this. Even if we don't use the variant images (core-cuda-tf1, core-cuda-tf2, core-cuda-torch) yet: we need the new CUDA compromise (core-cuda) and we can reuse the new deps-tf1 rule for the TF1 venv in ocrd_all.

bertsky · 2024-05-25T09:22:42Z

I resolved the conflict, but the CI failure seems unrelated – we probably just need to update test assets...

bertsky · 2024-05-26T21:15:50Z

we probably just need to update test assets...

Guessed right.

add deps-tf1 and docker-cuda-tf1

f728a97

bertsky mentioned this pull request Feb 9, 2024

Add Dockerfile OCR-D/ocrd_segment#66

Merged

kba approved these changes Feb 9, 2024

View reviewed changes

bertsky added 5 commits February 14, 2024 23:04

fix/update deps-cuda recipe

4cf0a35

fix/update deps-tf1 recipe

168ca1b

add Dockerfile.cuda-tf1

b411550

add sister rules for ocrd/core-cuda-tf2 and core-cuda-torch

310cc08

CD: try to build all Docker variants in parallel

729d438

Merge branch 'master' into add-docker-tf1

01e186b

update assets

b976723

kba approved these changes Jun 6, 2024

View reviewed changes

kba merged commit 3e021f9 into OCR-D:master Jun 7, 2024
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add deps-tf1 and docker-cuda-tf1 #1186

add deps-tf1 and docker-cuda-tf1 #1186

bertsky commented Feb 9, 2024

kba left a comment

bertsky commented Feb 12, 2024

bertsky commented Feb 14, 2024

bertsky commented Feb 14, 2024

bertsky commented May 25, 2024

bertsky commented May 25, 2024

bertsky commented May 26, 2024

add deps-tf1 and docker-cuda-tf1 #1186

add deps-tf1 and docker-cuda-tf1 #1186

Conversation

bertsky commented Feb 9, 2024

kba left a comment

Choose a reason for hiding this comment

bertsky commented Feb 12, 2024

bertsky commented Feb 14, 2024

bertsky commented Feb 14, 2024

bertsky commented May 25, 2024

bertsky commented May 25, 2024

bertsky commented May 26, 2024