-
-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(parse/html): handle unclosed elements more gracefully #5063
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to interject the PR, but this "fix" isn't obvious because no tests changed, other than the parser snapshots which is just a consequence of the new grammar.
For example, if the closing element is optional, it shouldn't emit diagnostics, instead the snapshots are still the same.
The formatting "bug" hasn't changed, I would expect something different from the snapshots, or at least we should have new one to cover future regressions.
@@ -84,7 +84,7 @@ HtmlSelfClosingElement = | |||
HtmlElement = | |||
opening_element: HtmlOpeningElement | |||
children: HtmlElementList | |||
closing_element: HtmlClosingElement | |||
closing_element: HtmlClosingElement? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't correct though, by spec a closing element should always be there. If we require handling some error case, we should use bogus nodes instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The formatter needs to be able to add closing tags to unclosed elements. See prettier's output: https://biomejs.dev/playground/?lintRules=all&files.main.html=PABkAGkAdgA%2BAA%3D%3D
The reason I did it like this is that that bogus nodes are not structured like normal nodes, and so it would be harder to extract the tag name so that the closing tag can be added. HTML_BOGUS_ELEMENT
doesn't necessarily have a HTML_OPENING_ELEMENT
when it occurs.
(Also, fun fact, the HTML spec does allow some elements to omit their closing tag, like <tr>
and <td>
. Prettier's parser doesn't handle that though. See the code examples here: https://html.spec.whatwg.org/multipage/tables.html#the-table-element)
I'll look into it some more and add some tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this comes from the fact that browsers can "fix" the HTML, this means that having something like the following snippet in valid in the browser.
<div><div><div><div><div><span><div><td><tr>
However, the w3c validator emits errors if the closing element is missing 🤔
I know that Astro parses the HTML as it was the browser, so it patches it during the compilation. I suppose it makes sense for a compiler, but I'm not sure it makes sense for a formatter.
I am torn about the change. There's also to note that Prettier uses a fork of the angular HTML parser, so we should expect that the parser is made for angular in the first place.
Maybe we could evaluate some options for HTML parsing (a strict one, where opening elements are mandatory, and a loose one; we can discuss it later). If you want to move forward with this change, that's fine. However, we need to change the parsing logic and not emit a diagnostic if the closing element is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, as far as I'm aware, the "auto fixing" that browsers do is just a thing browsers do and not actually specified in the html spec. The behavior I was talking about is the one that where if you give it <td> foo <td> bar
it will actually result in <td> foo </td> <td> bar </td>
(2 sibling tags) and not <td> foo <td> bar </td> </td>
(where the first is the parent of the second). But I digress.
Would it make sense to have a new node defined like this?
HtmlElementUnclosed =
opening_element: HtmlOpeningElement
children: HtmlElementList
I attempted this in another branch and encountered some difficulties with the parser assigning the closing tag to the wrong element in cases like this:
<div>
<span>
</div>
Where it would assign the </div>
to the <span>
instead of the div
, resulting in div
becoming the unclosed element in the AST rather than the span
.
CodSpeed Performance ReportMerging #5063 will not alter performanceComparing Summary
|
Summary
This PR makes it so the HTML parser can handle unclosed elements without failing to parse outright. It attempts to resolve the
SyntaxError
failures from the prettier tests in #5056.Test Plan
The related prettier tests parse successfully, and the existing tests emit diagnostics.