Skip to content

Latest commit

 

History

History
26 lines (16 loc) · 3.01 KB

README.md

File metadata and controls

26 lines (16 loc) · 3.01 KB

cor-asv-ann-models

Pretrained models for the ANN-based post-correction component.

Models

OCR post-correction

All models use cor-asv-ann-train --width 512 --depth 2 and were initialised with weights from a language model via --init-model, then pretrained on 200k lines of clean text (input=output) from DTA, and then retrained via --load-model on GT4HistOCR and OCR-D GT, processed by various OCR models (input=OCR with confidence, output=GT). The latter is allowed to change all weights (not just fine-tuning) and does not reset the encoder layer weights (--reset-encoder).

Other

  • gt4histocr.s-ſ: on GT4HistOCR ground truth degraded by replacing ſ into s in the input (encouraging the network to learn its reconstruction)

Evaluation

...