Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getTextArray: Add trim check in Do command #762

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ridgey-dev
Copy link

@ridgey-dev ridgey-dev commented Feb 12, 2025

Type of pull request

  • Bug fix (involves code and configuration changes)
  • New feature (involves code and configuration changes)
  • Documentation update
  • Something else

About

See: #761
Summary: If a document contains a Do command, if returns not corresponding text with the Text Matrix that is returned by using getDataTm.

This is because the Do command object is a Smalot\PdfParser\XObject\Image object, and that returns always an empty string ''.

@k00ni k00ni added missing or incomplete functionality For something which is not a bug, but more like an incomplete feature. enhancement and removed missing or incomplete functionality For something which is not a bug, but more like an incomplete feature. labels Feb 12, 2025
@ridgey-dev
Copy link
Author

@k00ni What is the progress of this PR? Are we waiting for something to merge it?

@k00ni
Copy link
Collaborator

k00ni commented Feb 18, 2025

Thank you for this PR.

I am currently very busy and hope I can give more feedback until the end of this month. But to be quite frank, I have no knowledge in this topic. It would help if you could add a link to the specification which covers this case.

Maybe @j0k3r, @GreyWyvern, @unixnut (or someone else from the community) might have some thoughts about this, which they can share in the meantime.

@ridgey-dev
Copy link
Author

ridgey-dev commented Feb 19, 2025

I just created a fix for the specific PDF (see PR). I didn't found a specification that addresses this issue. I only see the following situation:

getTextArray returns '' for the Do command (Draw object).

getDataTm loops through all data command (and the getDataCommands skips the Docommand), and the first Tj matches the position with the first string '' but it should match with the second string {{signer1}}.

Hopefully someone can look at it as it's a blocker for a project I'm working on!

@unixnut
Copy link
Contributor

unixnut commented Feb 27, 2025

For neatness, can you please do a "git rebase" and squash your commits into one before making a PR? Especially when one of them fixes a typo as is the case here.

Thank you for your efforts and for including tests!

@k00ni
Copy link
Collaborator

k00ni commented Mar 1, 2025

@unixnut mentioned:

I think changing getTextArray() is the wrong approach. This will break functionality for people who expect elements with empty strings.

If that is the case I would decline this PR. Because of my limited time I can't look into this matter any further. But would be more than open to read any (technical) arguments or/and links which show that this change will not lead to a breaking change, even though I am not sure how rare this case might be in the wild.

There might be an argument to use a switch in the Config class, but that feels kinda nasty at the moment.

We use semantic version and our users trust the behavior of PDFParser.

@j0k3r what do you suggest we do here?

@k00ni k00ni added the stale needs decision label Mar 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement stale needs decision
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants