Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html.parser is not a working replacement for sgmllib #10

Open
sdgathman opened this issue Sep 26, 2016 · 5 comments
Open

html.parser is not a working replacement for sgmllib #10

sdgathman opened this issue Sep 26, 2016 · 5 comments

Comments

@sdgathman
Copy link
Owner

The general advice has been to switch to lxml - which is also available for python2. However, my primary target system (centos7) has lxml for python3 only (in epel).

@whyscream
Copy link
Contributor

there are numerous ways around this:

  • use a virtualenv
  • file an EPEL bug
  • contribute the package to EPEL
  • wait until someone else undertakes one of the above

But non of these should be relevant to making design choices for the milter package.

@sdgathman
Copy link
Owner Author

Yeah, between porting to lxml, and making html.parser work, lxml seems like the most efficient way to proceed. The requirement for any solution should be for SGMLFilter to keep the same API. That will preserve compatibility with existing milters.

@sdgathman
Copy link
Owner Author

Reading through the lxml docs, it is unacceptable for milter applications. It apparently only know how to build trees from sax events, and write out the tree again. What we have to have for most milter applications is the SAX events. So lxml is out - back to the drawing board.

@sdgathman
Copy link
Owner Author

I solved this for now by porting sgmllib to python3, and including it in the python3 package. The long term solution is to make xml.sax work, or find a good sax api library.

@sdgathman
Copy link
Owner Author

It looks like xml.parsers.expat might be able to do the job. There are wrappers to harden it against malicious XML, but when you are not building a tree, the risk is low anyway. We'll need more test cases for SGMLFilter that verify it
a) doesn't crash
b) leaves content unchanged (except for overridden handlers)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants