html5lib
html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all major web browsers.
As html5lib is implemented in pure-python it is significantly slower than alternatives powered by lxml (like parsel or beautifulsoup).
However, html5lib implements a more true html5 parsing which can represent HTML tree more correctly than alternatives.
Example Use
```python import html5lib from html5lib import parse
html_doc = "
Alternatives / Similar
scrapling
new