utility for using transformers summarization models on text docs
The purpose of this package is to provide a simple interface (python API, CLI, gradio web UI) for using summarization models on text documents of arbitrary length.
⚠️ WARNING: This package is a WIP and is not ready for production use. Some things may not work yet. ⚠️
Installation
Install using pip:
# create a virtual environment (optional)
pipinstalltextsum
The textsum package is now installed in your virtual environment. You can now use the CLI or python API to summarize text docs see the Usage section for more details.
Full Installation
To install all the dependencies (includes PDF OCR, gradio UI demo, optimum, etc), run:
gitclonehttps://github.com/pszemraj/textsum.git
cdtextsum
# create a virtual environment (optional)
pipinstall-e.[all]
Additional Details
This package uses the clean-text python package, and like the "base" version of the package does not include the GPL-licensed unidecode dependency. If you want to use the unidecode package, install the package as an extra with pip:
pipinstalltextsum[unidecode]
In practice, text cleaning pre-summarization with/without unidecode should not make a significant difference.
Usage
There are three ways to use this package:
python API
CLI
Demo App
Python API
To use the python API, import the Summarizer class and instantiate it. This will load the default model and parameters.
You can then use the summarize_string method to summarize a long string of text.
fromtextsum.summarizeimportSummarizersummarizer=Summarizer()# loads default model and parameters# summarize a long stringout_str=summarizer.summarize_string('This is a long string of text that will be summarized.')print(f'summary: {out_str}')
you can also directly summarize a file:
out_path=summarizer.summarize_file('/path/to/file.txt')print(f'summary saved to {out_path}')
CLI
To summarize a directory of text files, run the following command: