-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PdfMerger breaks PDF/A compliance #1012
Comments
The metadata section that is missing looks like this in the original file:
|
The file trailer of the original looks like this:
whereas the merged one looks like this:
So we do have the ID keyword, but we violate
|
https://avepdf.com/pdfa-validation might also help us |
This probably should be updated to reflect the deprecation of PdfMerger in favor PdfWriter. |
According to >>> from pypdf import PdfReader, PdfWriter
>>> reader = PdfReader('PDFA-in-a-Nutshell_1b.pdf')
>>> metadata = reader.metadata
>>> writer = PdfWriter(clone_from=reader)
>>> writer.add_metadata(metadata)
>>> writer.write('merged.pdf')
(True, <_io.FileIO [closed]>)
>>> Running this through VeraPDF with the PDF/A-1A profile, I get some different issues:
Using the automatically detected profile (PDF/A-1B) only item 5 is being reported. |
@stefan6419846 What do you think we should do about this issue ? close as it as not planned ? |
We are recommending the |
The output file with the following code passed!
|
Ideally, we find a way to check this in CI as well to ensure that our changes do not accidentally break anything about this. |
That is certainly true. |
fpdf2 seems to already have some parts of this implemented in the CI, although ignoring PDF/A issues: https://github.com/py-pdf/fpdf2/blob/7784099dadeec551aa78511c06a6d7f525428265/.github/workflows/continuous-integration-workflow.yml#L45-L58 |
We should
Can you indicate against which standard you've checked the document and using which tool/website ? |
Sorry, I chose “PDF/A-1b Basic”. I should have chosen “PDF/A-1a”. |
Seems like some words got lost here? ;) |
We should/might prepare a dedicated set of tests to confirm. however I see two limitation: |
There are indeed multiple ways for verification. veraPDF is a Java application and should be no real issue in CI. For the PDF/A standard, we should start with a basic example like the file initially referenced in this issue. IMHO we never claimed that we would be able to generate such a file and I have no plans to change this for now. This does not prevent us from running basic validation like mentioned before, id est that passing through an existing PDF/A file does not break just to document the current behavior to avoid side effects of other changes. |
Use PdfMerger with a single PDF/A compliant document I would expect almost exactly the same output file as the input file. But it's way different - and PDF/A compliance is broken.
Code + PDF
Using this as an example document: https://www.pdfa.org/wp-content/uploads/2011/08/PDFA-in-a-Nutshell_1b.pdf
And https://demo.verapdf.org/ to verify if the document is compliant.
Issues
PDFA-in-a-Nutshell_1b.pdf
has 6.4 MB and is PDF/A compliantmerged.pdf
has 5.0 MB and is NOT PDF/compliantverapdf.org mentions that
100
issues were detected. It lists the following 3:The text was updated successfully, but these errors were encountered: