Skip to content

sumyvsreadability

Apache-2.0 28 4 3,670
152.5 thousand (month) Oct 20 2013 0.12.0(2026-02-14 21:00:12 ago)
2,894 5 37 Apache-2.0
Jun 30 2011 1.6 million (month) 0.8.4.1(2025-05-03 21:11:43 ago)

sumy is a Python library for automatic summarization of text documents. It can be used to extract summaries from various input formats such as plaintext, HTML, and URLs. It supports multiple languages and multiple summarization algorithms, including Latent Semantic Analysis (LSA), Luhn, Edmundson, TextRank, and SumBasic.

python-readability is a python package that allows developers to extract the main content of a web page, removing any unnecessary or unwanted elements, such as ads, navigation, and sidebars.

It is based on the algorithm used by the popular web-based service, Readability, and it uses the beautifulsoup4 package to parse the HTML and extract the main content.

Readability is similar to Newspaper in terms that it's extracting HTML data

Example Use


```python # -*- coding: utf-8 -*- from __future__ import absolute_import from __future__ import division, print_function, unicode_literals from sumy.parsers.html import HtmlParser from sumy.parsers.plaintext import PlaintextParser from sumy.nlp.tokenizers import Tokenizer from sumy.summarizers.lsa import LsaSummarizer as Summarizer from sumy.nlp.stemmers import Stemmer from sumy.utils import get_stop_words LANGUAGE = "english" SENTENCES_COUNT = 10 if __name__ == "__main__": url = "https://en.wikipedia.org/wiki/Automatic_summarization" parser = HtmlParser.from_url(url, Tokenizer(LANGUAGE)) # or for plain text files # parser = PlaintextParser.from_file("document.txt", Tokenizer(LANGUAGE)) # parser = PlaintextParser.from_string("Check this out.", Tokenizer(LANGUAGE)) stemmer = Stemmer(LANGUAGE) summarizer = Summarizer(stemmer) summarizer.stop_words = get_stop_words(LANGUAGE) for sentence in summarizer(parser.document, SENTENCES_COUNT): print(sentence) ```
```python import requests from readability import document response = requests.get('http://example.com') doc = document(response.content) doc.title() 'example domain' doc.summary() """
\n
\n

example domain

\n

this domain is established to be used for illustrative examples in documents. you may use this\n domain in examples without prior coordination or asking for permission.

\n

more information...

\n
\n\n

""" ```

Alternatives / Similar


Was this page helpful?