Uses Google Cloud Vision to perform Optical Character Recognition on jpg and png images.
- Clone this repo
git clone https://github.com/ammo0110/Google-OCR
- Install the requirements
pip3 install -r requirements.txt
- Enable the Cloud Vision API from Google Cloud Platform Console. Refer to this
- Get an API key for yourself. Refer to this
- Once you have the API key, execute main.py file with following arguments
python3 main.py `path_to_api_key_file` `path_to_input_image`
-o/--output
flag for redirecting output to a file- Recursive mode for recognizing a complete directory of jpg/png images at once
- Multithreaded processing in case of recursive mode
For help, use
python3 main.py --help
Google provides two kinds of APIs for its GCP services: REST APIs and the other language specific APIs. I immediately found some issues with the Python specific APIs, which are:
- They are written only for Python 2.
- You have to install Google Libraries on your system
- Unlike REST APIs, the authetication process for using these APIs is not decoupled from the API library itself. Although easier to use, I don't think that this kind of approach is practical from a design point of view.
Anyways, I decided to give it a try and wrote a program for Speech Recognition using the Python specific APIs. Then I realized one more problem with it. In case of an internet connection failure, these APIs don't report any error.
So now I have concluded that for Python, using REST APIs is the only option since it solves all the aforementioned problems.