-
Notifications
You must be signed in to change notification settings - Fork 0
Training German Print
german_print is a model which was trained from 2023-05-15 until 2023-05-16 for the recognition of printed (German and Latin) texts. The goal was to get a generic model which can be used with prints of all epochs, ranging from 15th incunabula until 20th century prints.
The latest model file german_print.mlmodel for Kraken OCR is available from Zenodo:
Weil, S., Kamlah, J., & Schmidt, T. (2023). OCR model for German prints trained from several datasets. Zenodo. https://doi.org/10.5281/zenodo.10519596
Data from all training processes including intermediate results can be found at https://ub-backup.bib.uni-mannheim.de/~stweil/tesstrain/kraken/german_print/.
The training used four ground truth data sets with 676 PAGE XML files / 187482 text lines. That data sets include texts from 15th to 20th century with a large focus on newspapers from the 19th century.
Most data sets only include the ground truth texts and rules how to get the images. So the missing images must be downloaded before the training can start.
Prepare shuffled list of all PAGE XML files:
find ~/src/github/UB-Mannheim/AustrianNewspapers/data/{TrainingSet_ONB_Newseye_GT_M1+,ValidationSet_ONB_Newseye_GT_M1+}/GT-PAGE -name "*.xml" > liste
find ~/src/github/UB-Mannheim/digi-gt/PPN* -name "*.xml" >> liste
find ~/src/github/UB-Mannheim/digitue-gt/{Theo,Tue,VD18} -name "*.xml" >> liste
find ~/src/github/UB-Mannheim/reichsanzeiger-gt/data/reichsanzeiger-1820-1939/GT-PAGE -name "*.xml" >> liste
shuf liste > liste.shuf
Compile the training data:
time ketos compile --format-type xml --files liste.shuf --workers 36 -o /data/stweil/german_print2.arrow
[05/15/23 08:25:11] WARNING Could not open file /home/stweil/src/github/UB-Mannheim/digitue-gt/Tue/JfI155/2_8783e_default.jpg in arrow_dataset.py:172
/home/stweil/src/github/UB-Mannheim/digitue-gt/Tue/JfI155/2_8783e_default.xml
[05/15/23 08:25:12] WARNING Could not open file /home/stweil/src/github/UB-Mannheim/digi-gt/PPN477380670/477380670_0114.jpg in arrow_dataset.py:172
/home/stweil/src/github/UB-Mannheim/digi-gt/PPN477380670/477380670_0114.xml
[05/15/23 08:25:16] WARNING Could not open file /home/stweil/src/github/UB-Mannheim/digitue-gt/Tue/JfI155/5_ca222_default.jpg in arrow_dataset.py:172
/home/stweil/src/github/UB-Mannheim/digitue-gt/Tue/JfI155/5_ca222_default.xml
[05/15/23 08:25:17] WARNING Could not open file /home/stweil/src/github/UB-Mannheim/digi-gt/PPN477380670/477380670_0115.jpg in arrow_dataset.py:172
/home/stweil/src/github/UB-Mannheim/digi-gt/PPN477380670/477380670_0115.xml
[05/15/23 08:25:20] WARNING Could not open file /home/stweil/src/github/UB-Mannheim/digitue-gt/Tue/JfI155/4_cd6fb_default.jpg in arrow_dataset.py:172
/home/stweil/src/github/UB-Mannheim/digitue-gt/Tue/JfI155/4_cd6fb_default.xml
[05/15/23 08:26:33] WARNING Invalid line 30 in /data/stweil/src/github/UB-Mannheim/digitue-gt/Theo/Die_paepstliche_Unfehlbarkeit/2_7cd63_default.jpg: Line polygon outside of image bounds arrow_dataset.py:57
[05/15/23 08:30:48] WARNING Invalid line 41 in /data/stweil/src/github/UB-Mannheim/digitue-gt/Theo/Die_paepstliche_Unfehlbarkeit/4_b74f9_default.jpg: Line polygon outside of image bounds arrow_dataset.py:57
Extracting lines ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 192229/192274 -:--:-- -:--:--
Output file written to /data/stweil/german_print2.arrow
real 14m27,011s
user 377m0,912s
sys 83m58,846s
Start the training:
ketos train -d cuda:0 -f binary -o german_print -r 0.003 --precision 16 --batch-size 9 /data/stweil/german_print2.arrow -s '[1,120,0,1 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do.1,2 Lbx200 Do]'
scikit-learn version 1.2.2 is not supported. Minimum required version: 0.17. Maximum required version: 1.1.2. Disabling scikit-learn conversion API.
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
You are using a CUDA device ('NVIDIA RTX A5000') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
┏━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ┃ Name ┃ Type ┃ Params ┃ In sizes ┃ Out sizes ┃
┡━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 0 │ val_cer │ CharErrorRate │ 0 │ ? │ ? │
│ 1 │ val_wer │ WordErrorRate │ 0 │ ? │ ? │
│ 2 │ net │ MultiParamSequential │ 4.1 M │ [[1, 1, 120, 400], '?'] │ [[1, 284, 1, 50], '?'] │
│ 3 │ net.C_0 │ ActConv2D │ 1.3 K │ [[1, 1, 120, 400], '?'] │ [[1, 32, 120, 400], '?'] │
│ 4 │ net.Do_1 │ Dropout │ 0 │ [[1, 32, 120, 400], '?'] │ [[1, 32, 120, 400], '?'] │
│ 5 │ net.Mp_2 │ MaxPool │ 0 │ [[1, 32, 120, 400], '?'] │ [[1, 32, 60, 200], '?'] │
│ 6 │ net.C_3 │ ActConv2D │ 40.0 K │ [[1, 32, 60, 200], '?'] │ [[1, 32, 60, 200], '?'] │
│ 7 │ net.Do_4 │ Dropout │ 0 │ [[1, 32, 60, 200], '?'] │ [[1, 32, 60, 200], '?'] │
│ 8 │ net.Mp_5 │ MaxPool │ 0 │ [[1, 32, 60, 200], '?'] │ [[1, 32, 30, 100], '?'] │
│ 9 │ net.C_6 │ ActConv2D │ 55.4 K │ [[1, 32, 30, 100], '?'] │ [[1, 64, 30, 100], '?'] │
│ 10 │ net.Do_7 │ Dropout │ 0 │ [[1, 64, 30, 100], '?'] │ [[1, 64, 30, 100], '?'] │
│ 11 │ net.Mp_8 │ MaxPool │ 0 │ [[1, 64, 30, 100], '?'] │ [[1, 64, 15, 50], '?'] │
│ 12 │ net.C_9 │ ActConv2D │ 110 K │ [[1, 64, 15, 50], '?'] │ [[1, 64, 15, 50], '?'] │
│ 13 │ net.Do_10 │ Dropout │ 0 │ [[1, 64, 15, 50], '?'] │ [[1, 64, 15, 50], '?'] │
│ 14 │ net.S_11 │ Reshape │ 0 │ [[1, 64, 15, 50], '?'] │ [[1, 960, 1, 50], '?'] │
│ 15 │ net.L_12 │ TransposedSummarizingRNN │ 1.9 M │ [[1, 960, 1, 50], '?'] │ [[1, 400, 1, 50], '?'] │
│ 16 │ net.Do_13 │ Dropout │ 0 │ [[1, 400, 1, 50], '?'] │ [[1, 400, 1, 50], '?'] │
│ 17 │ net.L_14 │ TransposedSummarizingRNN │ 963 K │ [[1, 400, 1, 50], '?'] │ [[1, 400, 1, 50], '?'] │
│ 18 │ net.Do_15 │ Dropout │ 0 │ [[1, 400, 1, 50], '?'] │ [[1, 400, 1, 50], '?'] │
│ 19 │ net.L_16 │ TransposedSummarizingRNN │ 963 K │ [[1, 400, 1, 50], '?'] │ [[1, 400, 1, 50], '?'] │
│ 20 │ net.Do_17 │ Dropout │ 0 │ [[1, 400, 1, 50], '?'] │ [[1, 400, 1, 50], '?'] │
│ 21 │ net.O_18 │ LinSoftmax │ 113 K │ [[1, 400, 1, 50], '?'] │ [[1, 284, 1, 50], '?'] │
└────┴───────────┴──────────────────────────┴────────┴──────────────────────────┴──────────────────────────┘
Trainable params: 4.1 M
Non-trainable params: 0
Total params: 4.1 M
Total estimated model params size (MB): 16
stage 0/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19223/19223 0:31:36 • 0:00:00 9.95it/s val_accuracy: 0.0 val_word_accuracy: 0.0 early_stopping: 0/10 0.00000
stage 1/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19223/19223 0:31:30 • 0:00:00 9.79it/s val_accuracy: 0.0 val_word_accuracy: 0.0 early_stopping: 1/10 0.00000
stage 2/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19223/19223 0:31:49 • 0:00:00 9.67it/s val_accuracy: 0.0 val_word_accuracy: 0.0 early_stopping: 1/10 0.00000
Validation ━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━ 1458/2136 0:00:52 • 0:00:25 27.65it/s early_stopping: 1/10 0.00000
(venv3.9) stweil@ocr-02:~/src/github/mittagessen/kraken/Druckschriften$ time ketos compile --format-type xml --files austriannewspapers.shuf --workers 36 -o /data/stweil/austriannewspapers.arrow
Extracting lines ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 59851/59851 -:--:-- -:--:--
Output file written to /data/stweil/austriannewspapers.arrow
real 2m29,184s
user 63m19,140s
sys 14m7,286s
(venv3.9) stweil@ocr-02:~/src/github/mittagessen/kraken/Druckschriften$ time ketos compile --format-type xml --files reichsanzeiger-gt.shuf --workers 36 -o /data/stweil/reichsanzeiger.arrow
Extracting lines ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 119429/119429 -:--:-- -:--:--
Output file written to /data/stweil/reichsanzeiger.arrow
real 11m31,541s
user 253m44,006s
sys 103m54,935s
(venv3.9) stweil@ocr-02:~/src/github/mittagessen/kraken/Druckschriften$ time ketos compile --format-type xml --files digi-gt.shuf --workers 36 -o /data/stweil/digi-gt.arrow
Extracting lines ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99% 3198/3240 -:--:-- -:--:--
Output file written to /data/stweil/digi-gt.arrow
real 0m31,222s
user 11m33,226s
sys 0m47,420s
(venv3.9) stweil@ocr-02:~/src/github/mittagessen/kraken/Druckschriften$ time ketos compile --format-type xml --files digitue-gt.shuf --workers 36 -o /data/stweil/digitue-gt.arrow
[05/15/23 15:43:07] WARNING Invalid line 30 in /data/stweil/src/github/UB-Mannheim/digitue-gt/Theo/Die_paepstliche_Unfehlbarkeit/2_7cd63_default.jpg: Line polygon outside of image bounds arrow_dataset.py:57
[05/15/23 15:43:15] WARNING Invalid line 41 in /data/stweil/src/github/UB-Mannheim/digitue-gt/Theo/Die_paepstliche_Unfehlbarkeit/4_b74f9_default.jpg: Line polygon outside of image bounds arrow_dataset.py:57
Extracting lines ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 9876/9883 -:--:-- -:--:--
Output file written to /data/stweil/digitue-gt.arrow
real 0m38,540s
user 19m13,052s
sys 0m36,279s
(venv3.9) stweil@ocr-02:~/src/github/mittagessen/kraken/Druckschriften$ ketos train -d cuda:0 -f binary -o models/austriannewspapers -r 0.003 --precision 16 --batch-size 9 /data/stweil/austriannewspapers.arrow -s '[1,120,0,1 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do.1,2 Lbx200 Do]'
scikit-learn version 1.2.2 is not supported. Minimum required version: 0.17. Maximum required version: 1.1.2. Disabling scikit-learn conversion API.
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
You are using a CUDA device ('NVIDIA RTX A5000') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
┏━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ┃ Name ┃ Type ┃ Params ┃ In sizes ┃ Out sizes ┃
┡━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 0 │ val_cer │ CharErrorRate │ 0 │ ? │ ? │
│ 1 │ val_wer │ WordErrorRate │ 0 │ ? │ ? │
│ 2 │ net │ MultiParamSequential │ 4.1 M │ [[1, 1, 120, 400], '?'] │ [[1, 188, 1, 50], '?'] │
│ 3 │ net.C_0 │ ActConv2D │ 1.3 K │ [[1, 1, 120, 400], '?'] │ [[1, 32, 120, 400], '?'] │
│ 4 │ net.Do_1 │ Dropout │ 0 │ [[1, 32, 120, 400], '?'] │ [[1, 32, 120, 400], '?'] │
│ 5 │ net.Mp_2 │ MaxPool │ 0 │ [[1, 32, 120, 400], '?'] │ [[1, 32, 60, 200], '?'] │
│ 6 │ net.C_3 │ ActConv2D │ 40.0 K │ [[1, 32, 60, 200], '?'] │ [[1, 32, 60, 200], '?'] │
│ 7 │ net.Do_4 │ Dropout │ 0 │ [[1, 32, 60, 200], '?'] │ [[1, 32, 60, 200], '?'] │
│ 8 │ net.Mp_5 │ MaxPool │ 0 │ [[1, 32, 60, 200], '?'] │ [[1, 32, 30, 100], '?'] │
│ 9 │ net.C_6 │ ActConv2D │ 55.4 K │ [[1, 32, 30, 100], '?'] │ [[1, 64, 30, 100], '?'] │
│ 10 │ net.Do_7 │ Dropout │ 0 │ [[1, 64, 30, 100], '?'] │ [[1, 64, 30, 100], '?'] │
│ 11 │ net.Mp_8 │ MaxPool │ 0 │ [[1, 64, 30, 100], '?'] │ [[1, 64, 15, 50], '?'] │
│ 12 │ net.C_9 │ ActConv2D │ 110 K │ [[1, 64, 15, 50], '?'] │ [[1, 64, 15, 50], '?'] │
│ 13 │ net.Do_10 │ Dropout │ 0 │ [[1, 64, 15, 50], '?'] │ [[1, 64, 15, 50], '?'] │
│ 14 │ net.S_11 │ Reshape │ 0 │ [[1, 64, 15, 50], '?'] │ [[1, 960, 1, 50], '?'] │
│ 15 │ net.L_12 │ TransposedSummarizingRNN │ 1.9 M │ [[1, 960, 1, 50], '?'] │ [[1, 400, 1, 50], '?'] │
│ 16 │ net.Do_13 │ Dropout │ 0 │ [[1, 400, 1, 50], '?'] │ [[1, 400, 1, 50], '?'] │
│ 17 │ net.L_14 │ TransposedSummarizingRNN │ 963 K │ [[1, 400, 1, 50], '?'] │ [[1, 400, 1, 50], '?'] │
│ 18 │ net.Do_15 │ Dropout │ 0 │ [[1, 400, 1, 50], '?'] │ [[1, 400, 1, 50], '?'] │
│ 19 │ net.L_16 │ TransposedSummarizingRNN │ 963 K │ [[1, 400, 1, 50], '?'] │ [[1, 400, 1, 50], '?'] │
│ 20 │ net.Do_17 │ Dropout │ 0 │ [[1, 400, 1, 50], '?'] │ [[1, 400, 1, 50], '?'] │
│ 21 │ net.O_18 │ LinSoftmax │ 75.4 K │ [[1, 400, 1, 50], '?'] │ [[1, 188, 1, 50], '?'] │
└────┴───────────┴──────────────────────────┴────────┴──────────────────────────┴──────────────────────────┘
Trainable params: 4.1 M
Non-trainable params: 0
Total params: 4.1 M
Total estimated model params size (MB): 16
stage 0/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5985/5985 0:09:45 • 0:00:00 10.47it/s val_accuracy: 0.017 val_word_accuracy: 0.0 early_stopping: 0/10 0.01676
stage 1/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5985/5985 0:09:44 • 0:00:00 10.21it/s val_accuracy: 0.001 val_word_accuracy: 0.0 early_stopping: 1/10 0.01676
stage 2/∞ ━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 380/5985 0:00:39 • 0:09:38 9.71it/s val_accuracy: 0.001 val_word_accuracy: 0.0 early_stopping: 1/10 0.01676
(venv3.9) stweil@ocr-02:~/src/github/mittagessen/kraken/Druckschriften$ ketos train -d cuda:0 -f binary -o models/austriannewspapers -r 0.0003 --precision 16 --batch-size 9 /data/stweil/austriannewspapers.arrow -s
'[1,120,0,1 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do.1,2 Lbx200 Do]'
scikit-learn version 1.2.2 is not supported. Minimum required version: 0.17. Maximum required version: 1.1.2. Disabling scikit-learn conversion API.
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
You are using a CUDA device ('NVIDIA RTX A5000') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
┏━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ┃ Name ┃ Type ┃ Params ┃ In sizes ┃ Out sizes ┃
┡━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 0 │ val_cer │ CharErrorRate │ 0 │ ? │ ? │
│ 1 │ val_wer │ WordErrorRate │ 0 │ ? │ ? │
│ 2 │ net │ MultiParamSequential │ 4.1 M │ [[1, 1, 120, 400], '?'] │ [[1, 188, 1, 50], '?'] │
│ 3 │ net.C_0 │ ActConv2D │ 1.3 K │ [[1, 1, 120, 400], '?'] │ [[1, 32, 120, 400], '?'] │
│ 4 │ net.Do_1 │ Dropout │ 0 │ [[1, 32, 120, 400], '?'] │ [[1, 32, 120, 400], '?'] │
│ 5 │ net.Mp_2 │ MaxPool │ 0 │ [[1, 32, 120, 400], '?'] │ [[1, 32, 60, 200], '?'] │
│ 6 │ net.C_3 │ ActConv2D │ 40.0 K │ [[1, 32, 60, 200], '?'] │ [[1, 32, 60, 200], '?'] │
│ 7 │ net.Do_4 │ Dropout │ 0 │ [[1, 32, 60, 200], '?'] │ [[1, 32, 60, 200], '?'] │
│ 8 │ net.Mp_5 │ MaxPool │ 0 │ [[1, 32, 60, 200], '?'] │ [[1, 32, 30, 100], '?'] │
│ 9 │ net.C_6 │ ActConv2D │ 55.4 K │ [[1, 32, 30, 100], '?'] │ [[1, 64, 30, 100], '?'] │
│ 10 │ net.Do_7 │ Dropout │ 0 │ [[1, 64, 30, 100], '?'] │ [[1, 64, 30, 100], '?'] │
│ 11 │ net.Mp_8 │ MaxPool │ 0 │ [[1, 64, 30, 100], '?'] │ [[1, 64, 15, 50], '?'] │
│ 12 │ net.C_9 │ ActConv2D │ 110 K │ [[1, 64, 15, 50], '?'] │ [[1, 64, 15, 50], '?'] │
│ 13 │ net.Do_10 │ Dropout │ 0 │ [[1, 64, 15, 50], '?'] │ [[1, 64, 15, 50], '?'] │
│ 14 │ net.S_11 │ Reshape │ 0 │ [[1, 64, 15, 50], '?'] │ [[1, 960, 1, 50], '?'] │
│ 15 │ net.L_12 │ TransposedSummarizingRNN │ 1.9 M │ [[1, 960, 1, 50], '?'] │ [[1, 400, 1, 50], '?'] │
│ 16 │ net.Do_13 │ Dropout │ 0 │ [[1, 400, 1, 50], '?'] │ [[1, 400, 1, 50], '?'] │
│ 17 │ net.L_14 │ TransposedSummarizingRNN │ 963 K │ [[1, 400, 1, 50], '?'] │ [[1, 400, 1, 50], '?'] │
│ 18 │ net.Do_15 │ Dropout │ 0 │ [[1, 400, 1, 50], '?'] │ [[1, 400, 1, 50], '?'] │
│ 19 │ net.L_16 │ TransposedSummarizingRNN │ 963 K │ [[1, 400, 1, 50], '?'] │ [[1, 400, 1, 50], '?'] │
│ 20 │ net.Do_17 │ Dropout │ 0 │ [[1, 400, 1, 50], '?'] │ [[1, 400, 1, 50], '?'] │
│ 21 │ net.O_18 │ LinSoftmax │ 75.4 K │ [[1, 400, 1, 50], '?'] │ [[1, 188, 1, 50], '?'] │
└────┴───────────┴──────────────────────────┴────────┴──────────────────────────┴──────────────────────────┘
Trainable params: 4.1 M
Non-trainable params: 0
Total params: 4.1 M
Total estimated model params size (MB): 16
stage 0/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5985/5985 0:09:55 • 0:00:00 9.81it/s val_accuracy: 0.97 val_word_accuracy: 0.868 early_stopping: 0/10 0.96973
stage 1/∞ ━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━ 1947/5985 0:03:17 • 0:07:01 9.60it/s val_accuracy: 0.97 val_word_accuracy: 0.868 early_stopping: 0/10 0.96973
(venv3.9) stweil@ocr-02:~/src/github/mittagessen/kraken/Druckschriften$ ketos train -d cuda:0 -f binary -o models/austriannewspapers -r 0.0003 --precision 16 --batch-size 9 /data/stweil/austriannewspapers.arrow -s '[1,120,0,1 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do.1,2 Lbx200 Do]' ^C
(venv3.9) stweil@ocr-02:~/src/github/mittagessen/kraken/Druckschriften$ ketos train -d cuda:0 -f binary -o models/german_print -r 0.0003 --precision 16 --batch-size 9 /data/stweil/german_print2.arrow -s '[1,120,0,1
Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do.1,2 Lbx200 Do]'
scikit-learn version 1.2.2 is not supported. Minimum required version: 0.17. Maximum required version: 1.1.2. Disabling scikit-learn conversion API.
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
You are using a CUDA device ('NVIDIA RTX A5000') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
┏━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ┃ Name ┃ Type ┃ Params ┃ In sizes ┃ Out sizes ┃
┡━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 0 │ val_cer │ CharErrorRate │ 0 │ ? │ ? │
│ 1 │ val_wer │ WordErrorRate │ 0 │ ? │ ? │
│ 2 │ net │ MultiParamSequential │ 4.1 M │ [[1, 1, 120, 400], '?'] │ [[1, 284, 1, 50], '?'] │
│ 3 │ net.C_0 │ ActConv2D │ 1.3 K │ [[1, 1, 120, 400], '?'] │ [[1, 32, 120, 400], '?'] │
│ 4 │ net.Do_1 │ Dropout │ 0 │ [[1, 32, 120, 400], '?'] │ [[1, 32, 120, 400], '?'] │
│ 5 │ net.Mp_2 │ MaxPool │ 0 │ [[1, 32, 120, 400], '?'] │ [[1, 32, 60, 200], '?'] │
│ 6 │ net.C_3 │ ActConv2D │ 40.0 K │ [[1, 32, 60, 200], '?'] │ [[1, 32, 60, 200], '?'] │
│ 7 │ net.Do_4 │ Dropout │ 0 │ [[1, 32, 60, 200], '?'] │ [[1, 32, 60, 200], '?'] │
│ 8 │ net.Mp_5 │ MaxPool │ 0 │ [[1, 32, 60, 200], '?'] │ [[1, 32, 30, 100], '?'] │
│ 9 │ net.C_6 │ ActConv2D │ 55.4 K │ [[1, 32, 30, 100], '?'] │ [[1, 64, 30, 100], '?'] │
│ 10 │ net.Do_7 │ Dropout │ 0 │ [[1, 64, 30, 100], '?'] │ [[1, 64, 30, 100], '?'] │
│ 11 │ net.Mp_8 │ MaxPool │ 0 │ [[1, 64, 30, 100], '?'] │ [[1, 64, 15, 50], '?'] │
│ 12 │ net.C_9 │ ActConv2D │ 110 K │ [[1, 64, 15, 50], '?'] │ [[1, 64, 15, 50], '?'] │
│ 13 │ net.Do_10 │ Dropout │ 0 │ [[1, 64, 15, 50], '?'] │ [[1, 64, 15, 50], '?'] │
│ 14 │ net.S_11 │ Reshape │ 0 │ [[1, 64, 15, 50], '?'] │ [[1, 960, 1, 50], '?'] │
│ 15 │ net.L_12 │ TransposedSummarizingRNN │ 1.9 M │ [[1, 960, 1, 50], '?'] │ [[1, 400, 1, 50], '?'] │
│ 16 │ net.Do_13 │ Dropout │ 0 │ [[1, 400, 1, 50], '?'] │ [[1, 400, 1, 50], '?'] │
│ 17 │ net.L_14 │ TransposedSummarizingRNN │ 963 K │ [[1, 400, 1, 50], '?'] │ [[1, 400, 1, 50], '?'] │
│ 18 │ net.Do_15 │ Dropout │ 0 │ [[1, 400, 1, 50], '?'] │ [[1, 400, 1, 50], '?'] │
│ 19 │ net.L_16 │ TransposedSummarizingRNN │ 963 K │ [[1, 400, 1, 50], '?'] │ [[1, 400, 1, 50], '?'] │
│ 20 │ net.Do_17 │ Dropout │ 0 │ [[1, 400, 1, 50], '?'] │ [[1, 400, 1, 50], '?'] │
│ 21 │ net.O_18 │ LinSoftmax │ 113 K │ [[1, 400, 1, 50], '?'] │ [[1, 284, 1, 50], '?'] │
└────┴───────────┴──────────────────────────┴────────┴──────────────────────────┴──────────────────────────┘
Trainable params: 4.1 M
Non-trainable params: 0
Total params: 4.1 M
Total estimated model params size (MB): 16
stage 0/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19223/19223 0:31:35 • 0:00:00 9.71it/s val_accuracy: 0.976 val_word_accuracy: 0.908 early_stopping: 0/10 0.97623
stage 1/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19223/19223 0:32:35 • 0:00:00 9.80it/s val_accuracy: 0.982 val_word_accuracy: 0.935 early_stopping: 0/10 0.98221
stage 2/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19223/19223 0:32:18 • 0:00:00 9.83it/s val_accuracy: 0.984 val_word_accuracy: 0.944 early_stopping: 0/10 0.98397
stage 3/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19223/19223 0:32:26 • 0:00:00 9.80it/s val_accuracy: 0.985 val_word_accuracy: 0.949 early_stopping: 0/10 0.98510
stage 4/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19223/19223 0:32:21 • 0:00:00 9.66it/s val_accuracy: 0.986 val_word_accuracy: 0.952 early_stopping: 0/10 0.98584
stage 5/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19223/19223 0:32:14 • 0:00:00 10.21it/s val_accuracy: 0.985 val_word_accuracy: 0.951 early_stopping: 1/10 0.98584
stage 6/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19223/19223 0:32:15 • 0:00:00 9.64it/s val_accuracy: 0.986 val_word_accuracy: 0.955 early_stopping: 0/10 0.98622
stage 7/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19223/19223 0:32:00 • 0:00:00 9.91it/s val_accuracy: 0.986 val_word_accuracy: 0.955 early_stopping: 0/10 0.98626
stage 8/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19223/19223 0:32:31 • 0:00:00 9.85it/s val_accuracy: 0.986 val_word_accuracy: 0.955 early_stopping: 1/10 0.98626
stage 9/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19223/19223 0:31:18 • 0:00:00 10.04it/s val_accuracy: 0.986 val_word_accuracy: 0.955 early_stopping: 1/10 0.98626
stage 9/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19223/19223 0:31:18 • 0:00:00 10.04it/s val_accuracy: 0.986 val_word_accuracy: 0.955 early_stopping: 2/10 0.98626
stage 10/∞ ━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━ 12009/19223 0:20:14 • 0:12:26 9.68it/s val_accuracy: 0.986 val_word_accuracy: 0.955 early_stopping: 2/10 0.98626
stage 10/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━ 12016/19223 0:20:15 • 0:12:23 9.70it/s val_accuracy: 0.986 val_word_accuracy: 0.955 early_stopping: 2/10 0.98626
stage 10/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━ 12019/19223 0:20:15 • 0:12:20 9.75it/s val_accuracy: 0.986 val_word_accuracy: 0.955 early_stopping: 2/10 0.98626
stage 10/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━ 12395/19223 0:20:53 • 0:11:33 9.86it/s val_accuracy: 0.986 val_word_accuracy: 0.955 early_stopping: 2/10 0.98626
stage 10/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19223/19223 0:32:21 • 0:00:00 9.53it/s val_accuracy: 0.986 val_word_accuracy: 0.955 early_stopping: 2/10 0.98626
Validation ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 238/2136 0:00:09 • 0:01:16 25.23it/s early_stopping: 2/10 0.98626