Skip to content

fitz - pdf to image conversion - some text characters are getting converted to junk #1627

Answered by JorjMcKie
Raxidi asked this question in Upstream Bugs
Discussion options

You must be logged in to vote

Thanks for reporting this so well prepared. I also received the material via e-mail.
Unfortunately I cannot do anything about this, because it is an upstream (MuPDF) issue. Your files are created in the wrong way:
They are using non-embedded fonts like Times Roman, but use Identity encoding instead of e.g. WinAnsiEncoding. Font using Identity-H encoding must be embedded, however.
This problem is exhibited by any PDF viewer if you try to copy / paste those problematic text portions: it will copy garbage.

One could argue that most viewers still render the page ok, so why does MuPDF rendering not do this?
This I cannot answer, so I must refer you to MuPDF's bug tracker https://bugs.ghostscri…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@Raxidi
Comment options

@JorjMcKie
Comment options

Answer selected by Raxidi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upstream bug bug outside this package
2 participants
Converted from issue

This discussion was converted from issue #1626 on March 07, 2022 11:41.