Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various XML, text and FB2 fixes #438

Merged
merged 4 commits into from
May 17, 2021
Merged

Conversation

poire-z
Copy link
Contributor

@poire-z poire-z commented May 14, 2021

fb2.css: ensure page break after

In case footnotes' <body> don't come with a proper <title> that would have ensured this page break.
See koreader/koreader#7625 (comment)

(Upstream) XML parsing: fix long named character references

Increase the buffer size for recognition of named character references to handle all those expected to be recognised. The longest name in the def_entity_table[] appears to be CounterClockwiseContourIntegral" so including a trailing null a buffer size of 32 is needed.
From buggins/coolreader#287

XML parsing: don't trim double spaces in attributes

Should allow closing koreader/koreader#7661 and hopefully won't cause much harm.

Fix ignore occasional space at start of line

Rework 89b0650: ignore the space even if followed by another text node; just don't ignore it for the first line where it could have some purpose (eg. to add some indentation).
See #364 (comment)


This change is Reviewable

poire-z and others added 4 commits May 14, 2021 20:08
In case footnotes' <body> don't come with a proper <title>
that would have ensured this page break.
Increase the buffer size for recognition of named character references
to handle all those expected to be recognised. The longest name in the
def_entity_table[] appears to be "CounterClockwiseContourIntegral" so
including a trailing null a buffer size of 32 is needed.
Rework 89b0650: ignore the space even if followed by
another text node; just don't ignore it for the first
line where it could have some purpose (eg. to add some
indentation).
lChar32 entname[16];
for ( k = 0; k < 16; k++ ) {
lChar32 entname[32];
for ( k = 0; k < 32; k++ ) {
Copy link
Member

@NiLuJe NiLuJe May 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd possibly replace the subsequent 32 occurrences with sizeof(entname)

EDIT: Oh, wait, not a byte array. Never mind me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless CRe already has an ARRAY_SIZE-like macro somewhere?

Otherwise: koreader/koreader-base@a01e142

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I would have been embarassed :) I try to not change commits marked "(Upstream)", by principle.
Sometimes, I get a chance to tweak them in a followup commit touching some related code.
But as we're diverging again :/, this may change.)

@poire-z poire-z merged commit 4946dfa into koreader:master May 17, 2021
@poire-z poire-z deleted the various_202105 branch May 17, 2021 08:40
@poire-z
Copy link
Contributor Author

poire-z commented May 17, 2021

Bumping this after 2021.05. When is the release planned?

@Frenzie
Copy link
Member

Frenzie commented May 17, 2021

Hopefully tomorrow. ^_^

@poire-z
Copy link
Contributor Author

poire-z commented Mar 10, 2024

Mentionning a XML parsing issue, that I did not have the courage to fix in #555:

<div>test XML parser and unquoted last attribute:</diV>
1:<img hspace="20" border=5 src="lights.gif"/>
2:<img hspace="20" border=10 src=lights.gif />
3:<img hspace="20" border=10 src=lights.gif/>  <!-- this fails -->
4:<img hspace="20" src=lights.gif border=15/>

image
I think there were other cases of failure with an unquoted attribute value at end of the tag that were more destructive than this sample, ignoring the whole tag and/or its content... may be with not-self-closing tags,

edit: beware when/if we fix this: if some elements did disappear, and fixing it would make it reappear, it could shift xpath/xpointers, screwing past highlights. So, it would need a new DOM_VERSION_CURRENT, that would need to propagate to the XML parser (which we up to now never had to do).

@Frenzie
Copy link
Member

Frenzie commented Mar 10, 2024

I will say it's slightly surprising that 2 and 4 work while 3 doesn't, but eh, they're all supposed to fail in XML really. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants