Training data generator for Text Detection and Text Recognition. The training data will be generated following the format specified by the various supported OCR systems. The supposted OCR systems are:
At the moment the datasets that can be used to generate the training data are IAM
, SROIE
, FUNSD
The main.py
needs a json file where all the configuration for the training data are specified.
{
"name": "output_folder_name",
"task": "training_data_task",
"datasets": [
"dataset_1",
"..."
]
}
To start the generation process just run:
python3 main.py --config config/config.json