CSS: parse/skip at-rules, support `@media`, `@supports` #454

poire-z · 2021-09-06T18:22:30Z

~~#### (Upstream) Minimal support for zip64~~ postponing this one for another PR

This is enough to open zip archives larger than 4GB.
From buggins/coolreader#310 - not sure we're using it as it depends on a #define - a bit of refacto to bits of code we use, hope it's fine.

XML parsing: strip leading `\n` in PRE and TEXTAREA

See #345 (comment) for details.
"For historical reasons, in the HTML syntax, a leading newline character immediately following the pre element start tag is stripped."
Bump DOM version as this can shift highlight xpointers indexes by 1. (It may still cause a shift for very old opened books that will get migrated from that very old DOM version to this one... but well, who highlights in PRE ?!! Or I could do this inconditionnally:
https://github.com/koreader/koreader/blob/ab4e27908b292ab7172f06bb2b1e203558780d96/frontend/apps/reader/modules/readerrolling.lua#L1415-L1420

Text: allow wrap on any space in "white-space: pre"

Note: there might still be something fishy with multiple spaces at boundaries between pre and non-pre text nodes, possibly overlapping.
Before, a long sequence of space was unbreakable, so the possible break was found by hyphenating the previous word (and the next break is because there's no other solution):

After:

The fix looks simple and sane - but well, with our text rendering code, we never know...
I think we have many other issues with white-space:pre (like what we do this, we don't do it in gerRenderedWidths()), but well, didn't have the courage for more...

initTableRendMethods(): avoid expensive processing

Not sure why we went with initNodeRendMethodRecursive(), it feels we just want to set it to this single node, whose inner content has already been inited, and shouldn't need them changed.
Fixes the 2nd issue noticed at #370 (comment):

More serious issue is with tables:
The table in 13.9 of https://diveintopython3.net/serializing.html would take a lot of time to render [...]
Add another layer ot tr > th > td, and it takes minutes...
It gets stuck into nested initTableRendMethods() initNodeRendMethodRecursive() recurseElementsDeepFirst(), so we probably have a real issue here.

Tables: fix handling of whitespace between nodes

Properly handle whitespaces between table element nodes according to the specs.
This fix was mostly not needed with normal tables as the XML parsers would remove these whitespace text, and the table rendering algorithm would ignore them.
But it might be needed when completing incomplete tables (where these whitespace may have been kept by the parser and should, or not, be included with what is wrapped) or when table elements are white-space:pre (which is rare).
We probably won't trigger this in real-life cases, but I had the bad idea to wrap all my "complete incomplete tables" test case in white-space:pre, and we did render mostly everything unlike Firefox.
Got to answer a few of my questions in the comments... A bit risky though (had to fight with "styles have changed in such a way..." on re-rendering), hope it will be ok.

ldomDocument: store screen size

Have LVDocView store the screen size in ldomDocument just so we'll be able to check CSS media queries like @media (min-device-width:500px) or (device-aspect-ratio:4/3)

CSS: more robust skipping of invalid content

Add a generic parsing helper skip_to_next() that properly observes the rules for matching pairs of (), [], {}, "", and '', and correctly handle escapes, as per specs.
Use it for next_property() and next_token().
Will be also of use when parsing CSS @-rules.

CSS: parse/skip at-rules, support `@media`, `@supports`

Properly parse CSS @-rules, skipping most of them (by handling their content as nested stylesheets or nested declarations) as we don't support much except @media and @supports.
Implement a AtRuleLogicalConditionParser and dedicated subclasses to parse logical combinations in @media and @supports.
Implement handling of @supports items by tweaking LVCssDeclaration::parse() and parse_content_property() so that the same code can be used for parsing and just checking if we would parse it fine.
References:
https://developer.mozilla.org/en-US/docs/Web/CSS/At-rule
https://developer.mozilla.org/en-US/docs/Web/CSS/Media_Queries/Using_media_queries
https://www.w3.org/TR/css3-mediaqueries/#media0
https://drafts.csswg.org/css-conditional/#at-supports

Some previous discussion in koreader/koreader#3179
Should allow closing koreader/koreader#7199.

CSS: handle media condition with `@import` and `<link/>`

Add support for:

  <link href="s.css" type="text/css" rel="stylesheet"
      media="screen and (orientation:portrait)"/>

and

  @import "s.css" screen and (orientation:portrait);

Tweak stylesheets handling in ldomDocumentWriters:
Parse and handle media= in <link href="foo" media="bar">, and translate them into @import url("foo") bar; in the <stylesheet> child element's text we put them in.
Rework a bit ldomDocumentFragmentWriter to make this parsing a bit clearer.

Handle media conditions in LVProcessStyleSheetImport().

No change to LVLoadStylesheetFile(), used by frontends to get the user-agent stylesheet, which still handle only a unique first @import, without media condition support.

Should allow closing koreader/koreader#3179.

CSS: fix parsing of "font-family:inherit!important"

When without any space.

LDOMNameIdMap::deserialize(): fix when >1024 items

Our max nb of elements and attributes IDs starts with 1024, but can grow if we meet more. Be sure we grow them too as needed when loading from cache.
Should avoid the issue noticed at koreader/koreader#7343 (comment).

(I apologize to the 3 users that decided to pick the nicks @media @supports @import. They are in my commit messages, so for any people that will sync their crengine fork with this repo, they may get a mail - but I guess that's not new to them and that by now, they have disabled such github mails :)

This change is

poire-z · 2021-09-06T18:34:22Z

@virxkane : I picked your commit without really studying it.
clang-tidy in our CI complains about:
crengine/src/lvstream.cpp:2770:17: warning: Potential leak of memory pointed to by 'item' [clang-analyzer-cplusplus.NewDeleteLeaks] and stuff related to your added NextOffset.
Details in https://github.com/koreader/crengine/pull/454/checks?check_run_id=3527231130

Hope I didn't do an error when picking :/ Can you have a look at this clang output and see if this is an issue (if not, we should find a way to silence it).

virxkane · 2021-09-06T19:06:03Z

I picked your commit without really studying it.
clang-tidy in our CI complains about:
crengine/src/lvstream.cpp:2770:17: warning: Potential leak of memory pointed to by 'item' [clang-analyzer-cplusplus.NewDeleteLeaks] and stuff related to your added NextOffset.
Details in https://github.com/koreader/crengine/pull/454/checks?check_run_id=3527231130

Should be fixed in https://github.com/virxkane/coolreader/commit/11d48c157e426acf8e950af674dba0dbe6037536

virxkane · 2021-09-06T19:13:53Z

not sure we're using it as it depends on a #define - a bit of refacto to bits of code we use, hope it's fine.

zip64 is only relevant if 'large file support' is enabled, so we check LVLONG_FILE_SUPPORT==1.

poire-z · 2021-09-06T19:24:42Z

Thanks! clang-tidy no longer complains about it !
(Still 2 other minor complains I can't understand, but ok.)

zip64 is only relevant if 'large file support' is enabled, so we check LVLONG_FILE_SUPPORT==1.

Yes, our build stuff vs. crsetup.h is a bit hard to grasp, but I think we don't enable it.

poire-z · 2021-09-07T15:59:39Z

Added a commit to fix the issue noticed at koreader/koreader#7343 (comment).

Frenzie · 2021-09-08T08:05:03Z

crengine/src/lvstsheet.cpp

-                                    while (*str && *str != ' ' ) {
+                                    // skip next char until we find a space or '!', that would start
+                                    // a new token or '!important" itself if stuck to previous token
+                                    while (*str && *str != ' ' && *str != '!') {


Huh, seems like something I might've noticed sooner. :-)

It is fine in the general parsing, line-height:1!important has been working without issue.

The issue was just with font-family which has some dedicated parsing, for splitting the multiple font names.
(And I was indeed lazy as you, typing font-family:inherit!important without spaces in my book style tweaks editor :)

I suppose you could call it lazy, but from a reading perspective, not a writing one.

I think it's easier to read (as long as it doesn't get too long):

a {line-height:1!important; font-weight:bold}

than

a { line-height: 1 !important; font-weight: bold; }

The reason I usually write the latter is a combination of how diffs work and that most other people seem to prefer it.

That being said, I'm not really sure about !important with or without a space. I normally don't use it, except in user styles.

Frenzie

lgtm, no immediate comments

"For historical reasons, in the HTML syntax, a leading newline character immediately following the pre element start tag is stripped." Bump DOM version as this can shift highlight xpointers indexes by 1.

Note: there might still be something fishy with multiple spaces at boundaries between pre and non-pre text nodes, possibly overlapping.

Not sure why we went with initNodeRendMethodRecursive(), it feels we just want to set it to this single node, whose inner content has already been inited, and shouldn't need them changed.

Properly handle whitespaces between table element nodes according to the specs. This fix was mostly not needed with normal tables as the XML parsers would remove these whitespace text, and the table rendering algorithm would ignore them. But it might be needed when completing incomplete tables (where these whitespace may have been kept by the parser and should, or not, be included with what is wrapped) or when table elements are white-space:pre (which is rare).

Have LVDocView store the screen size in ldomDocument just so we'll be able to check CSS media queries like @media (min-device-width:500px) or (device-aspect-ratio:4/3)

Add a generic parsing helper skip_to_next() that properly observes the rules for matching pairs of (), [], {}, "", and '', and correctly handle escapes, as per specs. Use it for next_property() and next_token(). Will be also of use when parsing CSS @-rules.

Properly parse CSS @-rules, skipping most of them (by handling their content as nested stylesheets or nested declarations) as we don't support much except @media and @supports. Implement a AtRuleLogicalConditionParser and dedicated subclasses to parse logical combinations in @media and @supports. Implement handling of @supports items by tweaking LVCssDeclaration::parse() and parse_content_property() so that the same code can be used for parsing and just checking if we would parse it fine.

Add support for: <link href="s.css" type="text/css" rel="stylesheet" media="screen and (orientation:portrait)"/> and @import "s.css" screen and (orientation:portrait); Tweak stylesheets handling in ldomDocumentWriters: Parse and handle media= in '<link href="foo" media="bar">', and translate them into '@import url("foo") bar;' in the <stylesheet> child element's text we put them in. Rework a bit ldomDocumentFragmentWriter to make this parsing a bit clearer. Handle media conditions in LVProcessStyleSheetImport(). No change to LVLoadStylesheetFile(), used by frontends to get the user-agent stylesheet, which still handle only a unique first @import, without media condition support.

When without any space.

Our max nb of elements and attributes IDs starts with 1024, but can grow if we meet more. Be sure we grow them too as needed when loading from cache.

poire-z · 2021-09-09T06:31:55Z

Postponing the zip64 stuff to an other PR, with more stuff from buggins/coolreader#312.

ptrm · 2021-09-10T10:34:22Z

That is great news! Thanks!

For the sake of css tweaks I would like to also get specific answer: @media (orientation: portrait) { body { font-size: 9pt; } } is still not supported, right? I tried this with no results, but want to double check;) I'm thinking of conditional stuff (like bigger margin on the button-facing screen edge on Forma or Libra) that could live in my tweaks instead of modified epubs :)

poire-z · 2021-09-10T11:01:25Z

This in a user style tweak works for me:

@media (orientation: portrait) { body { color: blue; } }

Note that it's the orientation of the "page" (where the BODY is rendered, excluding KOReader's own margins set via the bottom menu), and not of the "device screen".
So, in two-column mode in landscape, you may still trigger orientation: portrait

(Beware your 9pt will make your font a fixed size, and the only way to change it then is thru Zoom/DPI)

ptrm · 2021-09-10T11:30:01Z

(Beware your 9pt will make your font a fixed size, and the only way to change it then is thru Zoom/DPI)

Sure, that was just an example :)

So, in two-column mode in landscape, you may still trigger orientation: portrait

True. Same goes for width queries (match the column width), which has its upsides tbh :)

It must have been a typo with orientations, after checking max- and min-width, orientation is working :) Crazy thing is that min-color is working, too :D

I'm very glad it works inside css files too, anyway, will keep fiddling with it and report if need be :))))

poire-z · 2021-09-10T11:37:48Z

Crazy thing is that min-color is working, too :D

That is a crazy lie :) :

crengine/crengine/src/lvstsheet.cpp

Lines 1812 to 1825 in 69b3f55

    
           "color",            // We don't know the buffer type when loading a document, 
        
           "min-color",        // so we can't be accurate: we'll pretend for now we have 16M colors 
        
           "max-color", 
        
           "monochrome",       // For now, we'll say we're not monochrome, even on eInk 
        
           "grid",             // false, we're not a tty terminal with a fixed font 
        
           "scripting",        // "none", we don't support scripting 
        
           "update",           // "none", "Once it has been rendered, the layout can no longer be updated" 
        
           "overflow-inline",  // "none", no horizontal scrolling 
        
           "overflow-block",   // "paged", which is our main purpose (even if we have a scroll mode) 
        
           // Not implemented: 
        
           // "color-index", 
        
           // "min-color-index", 
        
           // "max-color-index",

ptrm · 2021-09-10T11:41:46Z

That is a crazy lie :) :

Wait, I swear on my PW4 (min-color: 12) was enabling the tweak inside the query, while (min-color: 16) wasn't :o

EDIT:

crengine/crengine/src/lvstsheet.cpp

Line 2032 in 69b3f55

int colors = 8;

Ohhh, it was 8 not 12 in min-color that worked, and it makes sense if color value is about bits per channel, which it is per spec. I mistook it for color-index :)

virxkane mentioned this pull request Sep 7, 2021

ZIP64 fixes & other... buggins/coolreader#312

Merged

poire-z force-pushed the css_at_rules branch from 42cd1e5 to 9ea32c6 Compare September 7, 2021 10:35

poire-z mentioned this pull request Sep 7, 2021

In-page classic classname footnotes only display the first paragraph of the footnote koreader/koreader#7343

Closed

Frenzie reviewed Sep 8, 2021

View reviewed changes

Frenzie approved these changes Sep 8, 2021

View reviewed changes

poire-z added 10 commits September 9, 2021 08:29

XML parsing: strip leading \n in PRE and TEXTAREA

197ee7e

"For historical reasons, in the HTML syntax, a leading newline character immediately following the pre element start tag is stripped." Bump DOM version as this can shift highlight xpointers indexes by 1.

Text: allow wrap on any space in "white-space: pre"

a187f95

Note: there might still be something fishy with multiple spaces at boundaries between pre and non-pre text nodes, possibly overlapping.

initTableRendMethods(): avoid expensive processing

538dc6d

Not sure why we went with initNodeRendMethodRecursive(), it feels we just want to set it to this single node, whose inner content has already been inited, and shouldn't need them changed.

ldomDocument: store screen size

1ce2f88

Have LVDocView store the screen size in ldomDocument just so we'll be able to check CSS media queries like @media (min-device-width:500px) or (device-aspect-ratio:4/3)

CSS: fix parsing of "font-family:inherit!important"

cd581d3

When without any space.

LDOMNameIdMap::deserialize(): fix when >1024 items

9962acd

Our max nb of elements and attributes IDs starts with 1024, but can grow if we meet more. Be sure we grow them too as needed when loading from cache.

poire-z force-pushed the css_at_rules branch from a379b02 to 9962acd Compare September 9, 2021 06:29

poire-z merged commit 69b3f55 into koreader:master Sep 9, 2021

poire-z deleted the css_at_rules branch September 9, 2021 06:32

This was referenced Sep 9, 2021

bump crengine: CSS: parse/skip at-rules koreader/koreader-base#1408

Merged

bump crengine: CSS: parse/skip at-rules koreader/koreader#8197

Merged

hius07 mentioned this pull request Sep 14, 2021

Justify txt koreader/koreader#8216

Closed

poire-z mentioned this pull request Nov 6, 2021

CSS: some fixes and optimisations #458

Merged

poire-z mentioned this pull request Jul 16, 2022

Monospace and fallback fonts tweaks #486

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSS: parse/skip at-rules, support `@media`, `@supports` #454

CSS: parse/skip at-rules, support `@media`, `@supports` #454

poire-z commented Sep 6, 2021 •

edited

Loading

poire-z commented Sep 6, 2021

virxkane commented Sep 6, 2021

virxkane commented Sep 6, 2021

poire-z commented Sep 6, 2021

poire-z commented Sep 7, 2021

Frenzie Sep 8, 2021

poire-z Sep 8, 2021

Frenzie Sep 8, 2021 •

edited

Loading

Frenzie Sep 8, 2021

Frenzie left a comment

poire-z commented Sep 9, 2021

ptrm commented Sep 10, 2021

poire-z commented Sep 10, 2021

ptrm commented Sep 10, 2021 •

edited

Loading

poire-z commented Sep 10, 2021

ptrm commented Sep 10, 2021 •

edited

Loading

CSS: parse/skip at-rules, support @media, @supports #454

CSS: parse/skip at-rules, support @media, @supports #454

Conversation

poire-z commented Sep 6, 2021 • edited Loading

XML parsing: strip leading \n in PRE and TEXTAREA

Text: allow wrap on any space in "white-space: pre"

initTableRendMethods(): avoid expensive processing

Tables: fix handling of whitespace between nodes

ldomDocument: store screen size

CSS: more robust skipping of invalid content

CSS: parse/skip at-rules, support @media, @supports

CSS: handle media condition with @import and <link/>

CSS: fix parsing of "font-family:inherit!important"

LDOMNameIdMap::deserialize(): fix when >1024 items

poire-z commented Sep 6, 2021

virxkane commented Sep 6, 2021

virxkane commented Sep 6, 2021

poire-z commented Sep 6, 2021

poire-z commented Sep 7, 2021

Frenzie Sep 8, 2021

Choose a reason for hiding this comment

poire-z Sep 8, 2021

Choose a reason for hiding this comment

Frenzie Sep 8, 2021 • edited Loading

Choose a reason for hiding this comment

Frenzie Sep 8, 2021

Choose a reason for hiding this comment

Frenzie left a comment

Choose a reason for hiding this comment

poire-z commented Sep 9, 2021

ptrm commented Sep 10, 2021

poire-z commented Sep 10, 2021

ptrm commented Sep 10, 2021 • edited Loading

poire-z commented Sep 10, 2021

ptrm commented Sep 10, 2021 • edited Loading

CSS: parse/skip at-rules, support `@media`, `@supports` #454

CSS: parse/skip at-rules, support `@media`, `@supports` #454

poire-z commented Sep 6, 2021 •

edited

Loading

XML parsing: strip leading `\n` in PRE and TEXTAREA

CSS: parse/skip at-rules, support `@media`, `@supports`

CSS: handle media condition with `@import` and `<link/>`

Frenzie Sep 8, 2021 •

edited

Loading

ptrm commented Sep 10, 2021 •

edited

Loading

ptrm commented Sep 10, 2021 •

edited

Loading