Get text from pdf page excluding page number #3992
Unanswered
vignesh0710
asked this question in
Looking for help
Replies: 1 comment
-
No one can know where the PDF creator has decided to put header, footer, etc. including page numbers. All this is just text as per the PDF's perspective. blocks=page.get_text("blocks", sort=True)
if "Page" in blocks[-1][4]: # text in the last block, adjust as needed
blocks = blocks[:-1] # ignore last block
text = "\n".join([b[4] for b in blocks]) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Trying to get text from pdf page excluding the page number in the right bottom corner
code:
This works for some cases, but often it removes more text than the page number.
Is there a better way to remove the page number when getting text from page?
Beta Was this translation helpful? Give feedback.
All reactions