-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] How to do searchable PDF via tesserocr #264
Comments
Probably better to use OCRmyPDF for this since it’s literally made for that use case. Tesserocr can help you perform OCR on images, but it doesn’t come with extensive PDF modification utilities built in because that’s outside the scope of the library. |
You can use the |
Have you tried @sirfz method or found a solution? I am interested in this too |
import tesserocr
tessdata_path = "tessdata"
outbase = "my_first_pdf"
image_filename = "5.png"
with tesserocr.PyTessBaseAPI(path=tessdata_path) as api:
api.SetVariable("tessedit_create_pdf", "true")
api.ProcessPages(outbase, image_filename) after applying PR #277 this should works too: img = Image.open(image_filename)
with tesserocr.PyTessBaseAPI(path=tessdata_path) as api:
api.SetVariable("tessedit_create_pdf", "true")
api.ProcessPage(outputbase=outbase,
image=img,
page_index=0,
filename=image_filename,
title="this will be title") |
Hello guys!
So I am completely new to tesseract and tesserocr. I need to make pdf file with the text layer a.k.a, searchable pdf.
I found in tesseract documentation that there's such thing as TessPDFRenderer So my question is there any way I can use this method via tesserocr and pycharm ?
I looked through the tesserocr.py and I haven't found anything even remotely close to that.
Thank you beforehand!
The text was updated successfully, but these errors were encountered: