Project Information
Featured
Links
|
A Python and PHP implementations of a HTML parser based on the WHATWG HTML5 specification for maximum compatibility with major desktop web browsers. Note that the separate ports are not kept in sync; they are effectively different projects offering similar functionality for their respective languages. Notes- Users of the sanitizer must ensure that they serialize with quoted attribute values to avoid some known script injection holes in older browsers including IE < 8
- The Ruby port is currently unmaintained
Python 0.95 Release Features- Parses valid and invalid HTML documents to a tree
- Support for minidom, ElementTree (including cElementTree and lxml.etree), BeautifulSoup (deprecated) and custom simpletree output formats
- DOM to SAX converter
- Reports parse errors
- Character encoding detection
- Filtering and serializing of trees
- HTML+CSS sanitizer
- Many unit tests
DocumentationUsing html5Lib Getting help/getting involved
|