Skip to content

sumy

3,457 2 22 Apache-2.0
0.11.0 (23 Oct 2022) Oct 20 2013 546.0 thousand (month)

sumy is a Python library for automatic summarization of text documents. It can be used to extract summaries from various input formats such as plaintext, HTML, and URLs. It supports multiple languages and multiple summarization algorithms, including Latent Semantic Analysis (LSA), Luhn, Edmundson, TextRank, and SumBasic.

Example Use


# -*- coding: utf-8 -*-

from __future__ import absolute_import
from __future__ import division, print_function, unicode_literals

from sumy.parsers.html import HtmlParser
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer as Summarizer
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words


LANGUAGE = "english"
SENTENCES_COUNT = 10


if __name__ == "__main__":
    url = "https://en.wikipedia.org/wiki/Automatic_summarization"
    parser = HtmlParser.from_url(url, Tokenizer(LANGUAGE))
    # or for plain text files
    # parser = PlaintextParser.from_file("document.txt", Tokenizer(LANGUAGE))
    # parser = PlaintextParser.from_string("Check this out.", Tokenizer(LANGUAGE))
    stemmer = Stemmer(LANGUAGE)

    summarizer = Summarizer(stemmer)
    summarizer.stop_words = get_stop_words(LANGUAGE)

    for sentence in summarizer(parser.document, SENTENCES_COUNT):
        print(sentence)

Alternatives / Similar


1,739 2024.2.26 (4 months ago) Dec 14 2008 compare
830 0.17.0 (a month ago) Oct 27 2015 compare
13,882 0.2.8 (5 years ago) Dec 28 2012 compare
3,188 1.11.0 (18 days ago) Jul 17 2019 compare
2,589 0.8.1 (4 years ago) Jun 30 2011 compare
194 2.0.7 (1 year, 8 months ago) Dec 11 2020 compare
10,672 1.1.9 (5 years ago) Aug 24 2018 compare

Other Languages

2,511 v1.3.0 (4 months ago) Apr 20 2016 compare
Was this page helpful?