Skip to content

Tesseract OCR Based Critical Data Extraction Service

License

Notifications You must be signed in to change notification settings

melihi/ocr_miner

Repository files navigation

Ocr Miner Service

Ocr miner , Tesseract based image to text service.

Ocr Miner can detect followed types of data:

  • Phone Number
    • 555-543-2109
    • 0212-9876543
    • 543-987-6543
    • 222 987 6543
    • (501)234-5678
    • +90539.456.7890
  • TR identity number, US Social Security Number, Europe VAT
    • BG1214317890
    • 60925736682
    • 001-26-4753
  • Credit Card
    • Visa,Master...
    • 3530-1113-3330-0000
    • 6011000990139424
    • 5105 1051 0510 5109
  • Plate
    • USA,Germnay,China,Russia,Turkey
  • Date
    • 02-02-1337
    • 02 02 1339
    • 12/02/1555
    • 22.02.1556
  • Email
    • email_validator
  • Domain
    • google.io
    • Strong validation IANA tld list
  • Url
  • Hash
    • Strong validation with Shannon entropy calculation.
    • Md5
    • Md4
    • Sha1
    • Sha256
    • Sha512
    • NTLM
  • Combolist

Technologies

  • Fastapi
  • Docker
  • Redis
  • Tesseract
  • SqlAlchemy
  • Pyvat
  • python-magic
  • email-validator
  • opencv-python-headless
  • jinja2

How to use

  • Edit envs/.env file

    • HOST="psql-service-name"
    • REDIS_HOST="redis-service-name"
    • USERNAME="psql-username"
    • PASSWORD="psql-password"
    • UPLOAD_FOLDER="data/uploads"
    • CLOUDFLARE_TURNSTILE="cloudflare-private-key"
      • change sitekey in ocrminer.js for CloudFlare.

Deployment

For more information : https://fastapi.tiangolo.com/deployment/server-workers/

For production enviroment :

set docker-compose > fastapi-service > command to :

gunicorn ocr_miner.api.ocr_miner_api:APP --workers 2 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000

For command line local developement

set docker-compose > fastapi-service > command to :

python3.11 manage.py  --api
docker-compose up

Frontend

Dont forget the set cloudflare keys

About

Tesseract OCR Based Critical Data Extraction Service

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published