Skip to content

html2text

1,595 8 85 GNU GPL 3
2024.2.26 (27 Feb 2024) Dec 14 2008 1.5 million (month)

html2text is a Python library that allows developers to convert HTML code into plain text. It is designed to be easy to use, and it provides several options to customize the output.

The package uses the python's built-in html.parser to parse the HTML and then convert it to plain text.

html2text also comes with a CLI tool that can convert HTML files to text:

Usage: html2text [filename [encoding]]

Option  Description
--version   Show program's version number and exit
-h, --help  Show this help message and exit
--ignore-links  Don't include any formatting for links
--escape-all    Escape all special characters. Output is less readable, but avoids corner case formatting issues.
--reference-links   Use reference links instead of links to create markdown
--mark-code Mark preformatted and code blocks with [code]...[/code]

Example Use


import html2text

h = html2text.HTML2Text()

# Ignore converting links from HTML
h.ignore_links = True
print h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!")
"Hello, world!"

print(h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!"))

"Hello, world!"

# Don't Ignore links anymore, I like links
h.ignore_links = False
print(h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!"))
"Hello, [world](https://www.google.com/earth/)!"

Alternatives / Similar


13,679 0.2.8 (5 years ago) Dec 28 2012 compare
2,911 1.9.0 (9 days ago) Jul 17 2019 compare
2,572 0.8.1 (3 years ago) Jun 30 2011 compare
3,429 0.11.0 (1 year, 6 months ago) Oct 20 2013 compare
823 0.16.0 (10 months ago) Oct 27 2015 compare
172 2.0.7 (1 year, 6 months ago) Dec 11 2020 compare
10,534 1.1.9 (5 years ago) Aug 24 2018 compare

Other Languages

2,400 v1.3.0 (2 months ago) Apr 20 2016 compare