cor-asv-ann-models

Pretrained models for the ANN-based post-correction component.

Models

OCR post-correction

All models use cor-asv-ann-train --width 512 --depth 2 and were initialised with weights from a language model via --init-model, then pretrained on 200k lines of clean text (input=output) from DTA, and then retrained via --load-model on GT4HistOCR and OCR-D GT, processed by various OCR models (input=OCR with confidence, output=GT). The latter is allowed to change all weights (not just fine-tuning) and does not reset the encoder layer weights (--reset-encoder).

dta19.Fraktur4: on 19th century Fraktur texts (GT4HistOCR/corpus/dta19) for the Tesseract 4 model script/Fraktur
pre19.Fraktur4: on 15-18th century blackletter texts (GT4HistOCR others, OCR-D) for the Tesseract 4 model script/Fraktur
pre19.deu-frak3: on 15-18th century blackletter texts (GT4HistOCR others, OCR-D) for the Tesseract 3 model deu-frak
pre19.Latin4: on 15-18th century blackletter texts (GT4HistOCR others, OCR-D) for the Tesseract 4 model script/Latin
pre19.deu4: on 15-18th century blackletter texts (GT4HistOCR others, OCR-D) for the Tesseract 4 model deu
pre19.incunabula: on 15-18th century blackletter texts (GT4HistOCR others, OCR-D) for the Ocropus 1 model incunabula.pyrnn included in GT4HistOCR
pre19.latinhist: on 15-18th century blackletter texts (GT4HistOCR others, OCR-D) for the Ocropus 1 model latinhist.pyrnn included in GT4HistOCR

Other

gt4histocr.s-ſ: on GT4HistOCR ground truth degraded by replacing ſ into s in the input (encouraging the network to learn its reconstruction)

Evaluation

...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

cor-asv-ann-models

Models

OCR post-correction

Other

Evaluation

Files

README.md

Latest commit

History

README.md

File metadata and controls

cor-asv-ann-models

Models

OCR post-correction

Other

Evaluation