Skip to content

readabilityvsgofeed

Apache-2.0 37 5 2,894
1.6 million (month) Jun 30 2011 0.8.4.1(2025-05-03 21:11:43 ago)
2,824 2 55 MIT
Apr 20 2016 58.1 thousand (month) v1.3.0(2024-03-01 03:34:34 ago)

python-readability is a python package that allows developers to extract the main content of a web page, removing any unnecessary or unwanted elements, such as ads, navigation, and sidebars.

It is based on the algorithm used by the popular web-based service, Readability, and it uses the beautifulsoup4 package to parse the HTML and extract the main content.

Readability is similar to Newspaper in terms that it's extracting HTML data

The gofeed library is a robust feed parser that supports parsing both RSS, Atom and JSON feeds. The library provides a universal gofeed.Parser that will parse and convert all feed types into a hybrid gofeed.Feed model.

You also have the option of utilizing the feed specific atom.Parser or rss.Parser or json.Parser parsers which generate atom. Feed , rss.Feed and json.Feed respectively.

Supported feed types:

  • RSS 0.90
  • Netscape RSS 0.91
  • Userland RSS 0.91
  • RSS 0.92
  • RSS 0.93
  • RSS 0.94
  • RSS 1.0
  • RSS 2.0
  • Atom 0.3
  • Atom 1.0
  • JSON 1.0
  • JSON 1.1

Example Use


```python import requests from readability import document response = requests.get('http://example.com') doc = document(response.content) doc.title() 'example domain' doc.summary() """
\n
\n

example domain

\n

this domain is established to be used for illustrative examples in documents. you may use this\n domain in examples without prior coordination or asking for permission.

\n

more information...

\n
\n\n

""" ```

```go // parse feed from URL fp := gofeed.NewParser() fp.UserAgent = "MyCustomAgent 1.0" // we can modify http client with custom headers etc. feed, _ := fp.ParseURL("http://feeds.twit.tv/twit.xml") fmt.Println(feed.Title)

// parse feed from string feedData := <rss version="2.0"> <channel> <title>Sample Feed</title> </channel> </rss> fp := gofeed.NewParser() feed, _ := fp.ParseString(feedData) fmt.Println(feed.Title)

// or file file, _ := os.Open("/path/to/a/file.xml") defer file.Close() fp := gofeed.NewParser() feed, _ := fp.Parse(file) fmt.Println(feed.Title) ```

Alternatives / Similar


Was this page helpful?