Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

well-formedness #848

Open
faassen opened this issue Feb 27, 2025 · 0 comments
Open

well-formedness #848

faassen opened this issue Feb 27, 2025 · 0 comments

Comments

@faassen
Copy link

faassen commented Feb 27, 2025

My understanding is that quick-xml supports reading XML but does not check for all well-formedness errors. It reports a whole bunch of them, but not everything. Since I'm interested in well-formedness I'm curious whether we could have a reader layered over the existing ones that does validate for well-formedness.

I'd like to make an inventory of what's missing:

  • while IllFormedError::MissingEndTag exists it is not actually produced by the reader normally, only if read_to_end is explicitly called.
  • putting illegal stuff on the top level such as multiple elements, text nodes, etc. Note that when dealing with XML fragments (as implied here, "fragment" has multiple meanings) it's possible to have multiple elements and text nodes on the top.
  • having a declaration without content is currently accepted

What other aspects of well-formedness did I miss that quick-xml currently does not check for? There are a whole cluster of them around entities, but I'm okay with ignoring DTDs entirely.

To implement checking whether tags are balanced, some kind of stack of which tags have been opened needs to be maintained. When considering how to do this efficiently I noticed that the internal ReaderState appears to maintain an efficient structure to track which elements have been started but not ended yet. But I don't think that's exposed to the outside world, is it? Could this indeed be useful for this?

Am I correct that quick-xml is pretty close in providing all the pieces already?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants