newspaper is a Python package that allows developers to easily extract text, images, and videos from articles on the web.
It is designed to be fast, easy to use, and compatible with a wide variety of websites. It uses advanced algorithms to extract relevant information and metadata from articles, and it also supports several languages.
newspaper includes a http client or can ingest pre-scraped HTML documents.
from newspaper import Article # Create a new article object article = Article('https://www.example.com/article') # Download the article article.download() # Parse the article article.parse() # Print the article text print(article.text) # Print the article title print(article.title) # Print the article authors print(article.authors) # Print the article publication date print(article.publish_date)