html2textvsgofeed
html2text is a Python library that allows developers to convert HTML code into plain text. It is designed to be easy to use, and it provides several options to customize the output.
The package uses the python's built-in html.parser to parse the HTML and then convert it to plain text.
html2text also comes with a CLI tool that can convert HTML files to text:
```shell Usage: html2text [filename [encoding]]
Option Description --version Show program's version number and exit -h, --help Show this help message and exit --ignore-links Don't include any formatting for links --escape-all Escape all special characters. Output is less readable, but avoids corner case formatting issues. --reference-links Use reference links instead of links to create markdown --mark-code Mark preformatted and code blocks with [code]...[/code] ```
The gofeed library is a robust feed parser that supports parsing both RSS, Atom and JSON feeds. The library provides a universal gofeed.Parser that will parse and convert all feed types into a hybrid gofeed.Feed model.
You also have the option of utilizing the feed specific atom.Parser or rss.Parser or json.Parser parsers which generate atom. Feed , rss.Feed and json.Feed respectively.
Supported feed types:
- RSS 0.90
- Netscape RSS 0.91
- Userland RSS 0.91
- RSS 0.92
- RSS 0.93
- RSS 0.94
- RSS 1.0
- RSS 2.0
- Atom 0.3
- Atom 1.0
- JSON 1.0
- JSON 1.1
Example Use
Hello, world!") "Hello, world!" print(h.handle("
Hello, world!")) "Hello, world!" # Don't Ignore links anymore, I like links h.ignore_links = False print(h.handle("
Hello, world!")) "Hello, [world](https://www.google.com/earth/)!" ```