newspaper
newspaper is a Python package that allows developers to easily extract text, images, and videos from articles on the web.
It is designed to be fast, easy to use, and compatible with a wide variety of websites. It uses advanced algorithms to extract relevant information and metadata from articles, and it also supports several languages.
newspaper includes a http client or can ingest pre-scraped HTML documents.
Example Use
from newspaper import Article
# Create a new article object
article = Article('https://www.example.com/article')
# Download the article
article.download()
# Parse the article
article.parse()
# Print the article text
print(article.text)
# Print the article title
print(article.title)
# Print the article authors
print(article.authors)
# Print the article publication date
print(article.publish_date)