Various XML, text and FB2 fixes #438

poire-z · 2021-05-14T18:15:24Z

fb2.css: ensure page break after

In case footnotes' <body> don't come with a proper <title> that would have ensured this page break.
See koreader/koreader#7625 (comment)

(Upstream) XML parsing: fix long named character references

Increase the buffer size for recognition of named character references to handle all those expected to be recognised. The longest name in the def_entity_table[] appears to be CounterClockwiseContourIntegral" so including a trailing null a buffer size of 32 is needed.
From buggins/coolreader#287

XML parsing: don't trim double spaces in attributes

Should allow closing koreader/koreader#7661 and hopefully won't cause much harm.

Fix ignore occasional space at start of line

Rework 89b0650: ignore the space even if followed by another text node; just don't ignore it for the first line where it could have some purpose (eg. to add some indentation).
See #364 (comment)

This change is

In case footnotes' <body> don't come with a proper <title> that would have ensured this page break.

Increase the buffer size for recognition of named character references to handle all those expected to be recognised. The longest name in the def_entity_table[] appears to be "CounterClockwiseContourIntegral" so including a trailing null a buffer size of 32 is needed.

Rework 89b0650: ignore the space even if followed by another text node; just don't ignore it for the first line where it could have some purpose (eg. to add some indentation).

NiLuJe · 2021-05-14T18:24:10Z

crengine/src/lvxml.cpp

-                lChar32 entname[16];
-                for ( k = 0; k < 16; k++ ) {
+                lChar32 entname[32];
+                for ( k = 0; k < 32; k++ ) {


I'd possibly replace the subsequent 32 occurrences with sizeof(entname)

EDIT: Oh, wait, not a byte array. Never mind me.

Unless CRe already has an ARRAY_SIZE-like macro somewhere?

Otherwise: koreader/koreader-base@a01e142

(I would have been embarassed :) I try to not change commits marked "(Upstream)", by principle.
Sometimes, I get a chance to tweak them in a followup commit touching some related code.
But as we're diverging again :/, this may change.)

poire-z · 2021-05-17T08:40:26Z

Bumping this after 2021.05. When is the release planned?

Frenzie · 2021-05-17T08:42:40Z

Hopefully tomorrow. ^_^

poire-z · 2024-03-10T20:28:29Z

Mentionning a XML parsing issue, that I did not have the courage to fix in #555:

<div>test XML parser and unquoted last attribute:</diV>
1:<img hspace="20" border=5 src="lights.gif"/>
2:<img hspace="20" border=10 src=lights.gif />
3:<img hspace="20" border=10 src=lights.gif/>  <!-- this fails -->
4:<img hspace="20" src=lights.gif border=15/>

I think there were other cases of failure with an unquoted attribute value at end of the tag that were more destructive than this sample, ignoring the whole tag and/or its content... may be with not-self-closing tags,

edit: beware when/if we fix this: if some elements did disappear, and fixing it would make it reappear, it could shift xpath/xpointers, screwing past highlights. So, it would need a new DOM_VERSION_CURRENT, that would need to propagate to the XML parser (which we up to now never had to do).

Frenzie · 2024-03-10T20:51:07Z

I will say it's slightly surprising that 2 and 4 work while 3 doesn't, but eh, they're all supposed to fail in XML really. :-)

poire-z and others added 4 commits May 14, 2021 20:08

fb2.css: ensure page break after <body>

4b40ee5

In case footnotes' <body> don't come with a proper <title> that would have ensured this page break.

XML parsing: don't trim double spaces in attributes

40c804a

Fix ignore occasional space at start of line

445b1f9

Rework 89b0650: ignore the space even if followed by another text node; just don't ignore it for the first line where it could have some purpose (eg. to add some indentation).

NiLuJe reviewed May 14, 2021

View reviewed changes

NiLuJe approved these changes May 14, 2021

View reviewed changes

Frenzie approved these changes May 14, 2021

View reviewed changes

poire-z merged commit 4946dfa into koreader:master May 17, 2021

poire-z deleted the various_202105 branch May 17, 2021 08:40

This was referenced May 19, 2021

bump crengine: various XML, text and FB2 fixes koreader/koreader-base#1373

Merged

bump crengine: various XML, text and FB2 fixes koreader/koreader#7712

Merged

virxkane mentioned this pull request Jun 11, 2021

Updates from koreader/crengine buggins/coolreader#299

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various XML, text and FB2 fixes #438

Various XML, text and FB2 fixes #438

poire-z commented May 14, 2021 •

edited by Frenzie

Loading

NiLuJe May 14, 2021 •

edited

Loading

NiLuJe May 14, 2021

poire-z May 14, 2021

poire-z commented May 17, 2021 •

edited

Loading

Frenzie commented May 17, 2021

poire-z commented Mar 10, 2024 •

edited

Loading

Frenzie commented Mar 10, 2024

Various XML, text and FB2 fixes #438

Various XML, text and FB2 fixes #438

Conversation

poire-z commented May 14, 2021 • edited by Frenzie Loading

fb2.css: ensure page break after

(Upstream) XML parsing: fix long named character references

XML parsing: don't trim double spaces in attributes

Fix ignore occasional space at start of line

NiLuJe May 14, 2021 • edited Loading

Choose a reason for hiding this comment

NiLuJe May 14, 2021

Choose a reason for hiding this comment

poire-z May 14, 2021

Choose a reason for hiding this comment

poire-z commented May 17, 2021 • edited Loading

Frenzie commented May 17, 2021

poire-z commented Mar 10, 2024 • edited Loading

Frenzie commented Mar 10, 2024

poire-z commented May 14, 2021 •

edited by Frenzie

Loading

NiLuJe May 14, 2021 •

edited

Loading

poire-z commented May 17, 2021 •

edited

Loading

poire-z commented Mar 10, 2024 •

edited

Loading