OCR Image to Text Conversion

This guide provides instructions on how to perform Optical Character Recognition (OCR) to convert images to text using Tesseract-OCR and Python.

Tesseract-OCR Method

Install Tesseract-OCR: sudo apt install tesseract-ocr
Download the Bulgarian language dictionary: bul.traineddata
Move the downloaded dictionary to the Tesseract-OCR data directory: mv bul.traineddata /usr/share/tesseract-ocr/4.00/tessdata/
Install jq sudo apt install jq

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Newspapers/Ilinden		Newspapers/Ilinden
bounding boxes		bounding boxes
convert&fix text		convert&fix text
directory		directory
extract-scanned-photos		extract-scanned-photos
LICENSE		LICENSE
README.md		README.md
ocr.py		ocr.py
ocr.sh		ocr.sh