-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML API: Serialize normative html5 doctype or nothing #7793
base: trunk
Are you sure you want to change the base?
HTML API: Serialize normative html5 doctype or nothing #7793
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. A document without a doctype declaration is automatically in quirks mode?
⤷ Anything else
If the document is not an iframe srcdoc document, then this is a parse error; if the parser cannot change the mode flag is false, set the Document to quirks mode.
So if we ignore the token we get a quirks-mode document? But then we don’t positively identify it as such, we instead let it be without a doctype? Do you think there's still value in positively affirming that it’s in quirks mode? I’m thinking of systems that go on to do further work with the HTML
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
|
That is my understanding. Processing a DOCTYPE token can result in wordpress-develop/src/wp-includes/html-api/class-wp-html-processor.php Lines 1330 to 1353 in a27e6a8
|
This is the question 🙂 If we're going to have a quirks-mode DOCTYPE, I think it should either be omission or the (normalized) input DOCTYPE. In the same way that There remains the issue of limited-quirks DOCTYPEs, and those probably need to have their (normalized) DOCTYPES preserved given it's unclear what the ideal limited-quirks DOCTYPE would be. I don't feel strongly about any of these, or even that this PR is a better approach. I'm open to being convinced. DOCTYPEs do contain information that's accessible in the browsers. They can be normalized to a standard form, but I'm a bit reluctant to perform this "aggressive" normalization to quirks/limited-quirks/no-quirks DOCTYPEs. |
they do, and this is one reason that I think it’s proper to normalize them as well. talking about predictability and improved reliability for unreliable legacy and new systems, it makes me wonder who we’re benefiting by passing along a dorktype. do we know the effects of limited-quirks mode? so far we don’t even track that do we? I guess I’m comfortable leaving it a known potential bug until we understand the implications of it, but the fix is trivial — add that extra case.
for all fragment parsers this is the default behavior. we’re considering the wholesale rewrite of a full document in this patch. perhaps that’s good reason to remind ourselves that the impact of this existing behavior and change is minimal. still, it worries me leaving things ambiguous, since the same HTML could appear as standards-mode in which case it would parse differently. there’d be no clue to the provenance of the document as a quirks- or limited-quirks document. this is also why I thought about using a new public identifier linking back to WordPress — any downstream system could see that and link back to the HTML API. “Oh! this is quirks-mode produced by WordPress, that’s why it’s this way…” Maybe my worries are overblown. |
The DOCTYPE parser does recognize limited-quirks doctypes correctly according to the spec. The HTML processor doesn't care about it because it does not affect parsing or tree construction, I believe it only affects rendering. I believe limited quirks keeps some of the rendering peculiarities of quirks mode, but does not include things like case-insensitive class selectors or the P > TABLE tree construction difference that's in quirks mode. Here's an example of a rendering and/or tree construction differences:
Chrome, Firefox, and Safari all continue to support limited-quirks mode.
|
In my opinion, the safest, least surprising, and most obvious thing to do is serialize a normalized version of the DOCTYPE token as was done in #7780. This respects the input and maintains the no-/limited-/quirks mode of the document. I don't think we can go wrong preserving these aspects of the original document. I could be convinced that replacing no-quirks doctypes with the standard Finally, I'm really not comfortable with changing a document's mode from limited-quirks, which probably means that limited-quirks must respect the original doctype, which leads me back to thinking that the original doctype should just be respected. |
you’ve made a compelling case for preserving limited-quirks mode, though I’m just going to take you at your word that those screenshots are different — I don’t see it. I’m still for radical rewriting though. In For the purpose of normalization and predictability, I still like the idea of pre-defined DOCTYPE tokens, just now with one known to create limited quirks. That breaks the reference to WordPress (since any new DOCTYPE will indicate quirks-mode) but it would also mean that all documents produced will have one of three very-easily searchable DOCTYPE declarations. Again, this is an issue on the periphery I think, as most serializations will likely occur on fragments and most will be HTML5 (or should be). If not for the complexity involved I’d far prefer we produce HTML5 standards-mode documents for all inputs, but alas, that seems intractable (okay maybe I’ll be eating these words in a few months when we do it). That means I will support your choice, but I think that preserving “dorktypes” is a misunderstanding of the goal of |
See this comment
Trac ticket: https://core.trac.wordpress.org/ticket/62396
This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.