OCR dataset generator

Training data generator for Text Detection and Text Recognition. The training data will be generated following the format specified by the various supported OCR systems. The supposted OCR systems are:

docTR
mmOCR
PaddleOCR

At the moment the datasets that can be used to generate the training data are IAM, SROIE, FUNSD

How to use

The main.py needs a json file where all the configuration for the training data are specified.

{
    "name": "output_folder_name",
    "task": "training_data_task",
    "datasets": [
        "dataset_1",
        "..."
    ]
}

To start the generation process just run:

python3 main.py --config config/config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

OCR dataset generator

How to use

Adding new datasets

Files

README.md

Latest commit

History

README.md

File metadata and controls

OCR dataset generator

How to use

Adding new datasets