Skip to content

Latest commit

 

History

History
24 lines (23 loc) · 847 Bytes

README.md

File metadata and controls

24 lines (23 loc) · 847 Bytes

OCR dataset generator

Training data generator for Text Detection and Text Recognition. The training data will be generated following the format specified by the various supported OCR systems. The supposted OCR systems are:

At the moment the datasets that can be used to generate the training data are IAM, SROIE, FUNSD

How to use

The main.py needs a json file where all the configuration for the training data are specified.

{
    "name": "output_folder_name",
    "task": "training_data_task",
    "datasets": [
        "dataset_1",
        "..."
    ]
}

To start the generation process just run:

python3 main.py --config config/config.json

Adding new datasets