-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
html.parser is not a working replacement for sgmllib #10
Comments
there are numerous ways around this:
But non of these should be relevant to making design choices for the milter package. |
Yeah, between porting to lxml, and making html.parser work, lxml seems like the most efficient way to proceed. The requirement for any solution should be for SGMLFilter to keep the same API. That will preserve compatibility with existing milters. |
Reading through the lxml docs, it is unacceptable for milter applications. It apparently only know how to build trees from sax events, and write out the tree again. What we have to have for most milter applications is the SAX events. So lxml is out - back to the drawing board. |
I solved this for now by porting sgmllib to python3, and including it in the python3 package. The long term solution is to make xml.sax work, or find a good sax api library. |
It looks like xml.parsers.expat might be able to do the job. There are wrappers to harden it against malicious XML, but when you are not building a tree, the risk is low anyway. We'll need more test cases for SGMLFilter that verify it |
The general advice has been to switch to lxml - which is also available for python2. However, my primary target system (centos7) has lxml for python3 only (in epel).
The text was updated successfully, but these errors were encountered: