Skip to content

gofeedvsnewspaper

MIT 41 2 2,400
58.1 thousand (month) Apr 20 2016 v1.3.0(a month ago)
13,656 6 520 MIT
0.2.8(5 years ago) Dec 28 2012 706.6 thousand (month)

The gofeed library is a robust feed parser that supports parsing both RSS, Atom and JSON feeds. The library provides a universal gofeed.Parser that will parse and convert all feed types into a hybrid gofeed.Feed model.

You also have the option of utilizing the feed specific atom.Parser or rss.Parser or json.Parser parsers which generate atom. Feed , rss.Feed and json.Feed respectively.

Supported feed types:

  • RSS 0.90
  • Netscape RSS 0.91
  • Userland RSS 0.91
  • RSS 0.92
  • RSS 0.93
  • RSS 0.94
  • RSS 1.0
  • RSS 2.0
  • Atom 0.3
  • Atom 1.0
  • JSON 1.0
  • JSON 1.1

newspaper is a Python package that allows developers to easily extract text, images, and videos from articles on the web.

It is designed to be fast, easy to use, and compatible with a wide variety of websites. It uses advanced algorithms to extract relevant information and metadata from articles, and it also supports several languages.

newspaper includes a http client or can ingest pre-scraped HTML documents.

Example Use


// parse feed from URL
fp := gofeed.NewParser()
fp.UserAgent = "MyCustomAgent 1.0"  // we can modify http client with custom headers etc.
feed, _ := fp.ParseURL("http://feeds.twit.tv/twit.xml")
fmt.Println(feed.Title)

// parse feed from string
feedData := `<rss version="2.0">
<channel>
<title>Sample Feed</title>
</channel>
</rss>`
fp := gofeed.NewParser()
feed, _ := fp.ParseString(feedData)
fmt.Println(feed.Title)

// or file
file, _ := os.Open("/path/to/a/file.xml")
defer file.Close()
fp := gofeed.NewParser()
feed, _ := fp.Parse(file)
fmt.Println(feed.Title)
from newspaper import Article

# Create a new article object
article = Article('https://www.example.com/article')

# Download the article
article.download()

# Parse the article
article.parse()

# Print the article text
print(article.text)

# Print the article title
print(article.title)

# Print the article authors
print(article.authors)

# Print the article publication date
print(article.publish_date)

Alternatives / Similar