html5-parser

700 1 1 Apache-2.0

0.4.12 (19 Nov 2023) Jun 03 2007 17.6 thousand (month)

html5-parser is a Python library for parsing HTML and XML documents.

A fast implementation of the HTML 5 parsing spec for Python. Parsing is done in C using a variant of the gumbo parser. The gumbo parse tree is then transformed into an lxml tree, also in C, yielding parse times that can be a thirtieth of the html5lib parse times. That is a speedup of 30x. This differs, for instance, from the gumbo python bindings, where the initial parsing is done in C but the transformation into the final tree is done in python.

It is built on top of the popular lxml library and provides a simple and intuitive API for working with the document's structure.

html5-parser uses the HTML5 parsing algorithm, which is more lenient and forgiving than the traditional XML-based parsing algorithm. This means that it can parse HTML documents with malformed or missing tags and still produce a usable parse tree.

To use html5-parser, you first need to install it via pip by running pip install html5-parser. Once it is installed, you can use the html5_parser.parse() function to parse an HTML document and create a parse tree. For example:

``` from html5_parser import parse

html_string = "Hello, World!" root = parse(html_string) print(root.tag) # html ` You can also use `html5_parser.parse() with file-like objects, bytes or file paths.

Once you have a parse tree, you can use the find() and findall() methods to search for elements in the document similar to BeautifulSoup.

html5-parser also supports searching using xpath, similar to lxml.

Example Use

```python from html5_parser import parse

html_string = "Hello, World!" root = parse(html_string) print(root.tag) # html body = root.find("body")

or find all

print(body.text) # "Hello, World!" for el in root.findall("p"): print(el.text) # "Hello ```

Alternatives / Similar

lxml

3,010 6.0.3 (2026-04-09 14:33:38 ago) Dec 13 2022 compare

beautifulsoup

- 4.14.3 (2025-11-30 15:08:24 ago) Jul 26 2019 compare

xmltodict

5,734 1.0.4 (2026-02-22 02:21:21 ago) Jul 30 2007 compare

html5lib

1,220 1.1 (2020-06-22 23:32:36 ago) Jul 30 2007 compare

cssselect

309 1.4.0 (2026-01-29 07:00:24 ago) Apr 14 2012 compare

feedparser

2,351 6.0.12 (2025-09-10 13:33:58 ago) Jun 15 2007 compare

parsel

1,324 1.11.0 (2026-01-29 07:19:22 ago) Jul 26 2019 compare

selectolax

1,607 0.4.7 (2026-03-06 09:23:35 ago) Mar 01 2018 compare

pyquery

2,381 2.0.1 (2024-08-30 08:12:22 ago) Dec 05 2008 compare

requests-html

13,863 0.10.0 (2019-02-17 20:14:17 ago) Feb 25 2018 compare

untangle

632 1.2.1 (2022-07-02 14:09:28 ago) Jun 09 2011 compare

scrapling new

36,206 0.4.5 (2026-04-07 04:22:27 ago) Aug 01 2024 compare

chompjs

218 1.4.0 (2025-08-04 21:07:54 ago) Jul 30 2007 compare

gazpacho

768 1.1 (2020-10-09 12:50:18 ago) Dec 28 2012 compare

chopper

23 0.6.0 (2023-04-26 10:16:25 ago) Jul 24 2014 compare

Other Languages

parse5

3,886 8.0.0 (2026-02-21 19:30:52 ago) Jul 03 2013 compare

sax-js

1,153 1.6.0 (2026-03-17 01:32:31 ago) Feb 09 2011 compare

htmlparser2

4,789 12.0.0 (2026-03-20 23:08:40 ago) Aug 28 2011 compare

jsdom new

21,552 29.0.2 (2026-04-07 03:38:38 ago) Nov 21 2011 compare

cheerio

30,265 1.2.0 (2026-02-21 19:30:40 ago) Oct 08 2011 compare

nokogiri

6,248 1.19.2 (2026-03-19 21:12:43 ago) Jul 25 2009 compare

xml2

223 1.5.2 (2025-12-01 15:40:00 ago) Apr 20 2015 compare

rvest

1,517 1.0.5 (2024-02-12 21:10:00 ago) Nov 22 2014 compare

html5-php

1,772 2.10.0 (2025-07-25 09:04:22 ago) Jun 01 2013 compare

domcrawler

4,038 v8.0.8 (2026-03-30 15:14:47 ago) Sep 26 2011 compare

goquery

14,926 v1.12.0 (2026-03-15 16:28:52 ago) Aug 29 2016 compare

cascadia

754 Start (2018-02-20 18:47:44 ago) Feb 20 2018 compare

htmlquery

781 v1.3.6 (2026-03-06 04:46:15 ago) Feb 07 2019 compare

xpath

739 v1.3.6 (2026-02-23 07:10:29 ago) Jun 08 2019 compare

soup

2,227 v1.2.5 (2022-01-16 14:36:54 ago) Apr 29 2017 compare

embed

2,103 v4.4.15 (2025-01-02 16:53:09 ago) Oct 26 2013 compare

simple-html-dom new

- 2.0-RC2 (2019-11-09 15:42:50 ago) Nov 09 2019 compare

ralger

165 2.3.0 (2021-03-18 00:10:00 ago) Dec 22 2019 compare