Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization

We provide the source code for the paper "Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization", accepted at EMNLP'18. If you find the code useful, please cite the following paper.

@inproceedings{lebanoff-song-liu:2018,
 Author = {Logan Lebanoff and Kaiqiang Song and Fei Liu},
 Title = {Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization},
 Booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
 Year = {2018}}

Goal

Our system seeks to summarize a set of articles (about 10) about the same topic.
The code takes as input a text file containing a set of articles. See below on the input format of the files.

Dependencies

The code is written in Python (v2.7) and TensorFlow (v1.4.1). We suggest the following environment:

A Linux machine (Ubuntu) with GPU (Cuda 8.0)
Python (v2.7)
TensorFlow (v1.4.1)
Pyrouge
NLTK

How to Generate Summaries

Clone this repo. Download this ZIP file containing the pretrained model from See et al. Move the folder pretrained_model_tf1.2.1 into the ./logs/ directory.

$ git clone https://github.com/ucfnlp/multidoc_summarization/
$ mv pretrained_model_tf1.2.1.zip multidoc_summarization/logs
$ cd multidoc_summarization/logs
$ unzip pretrained_model_tf1.2.1.zip
$ rm pretrained_model_tf1.2.1.zip
$ cd ..

Format your data in the following way:

One file for each topic. Distinct articles will be separated by one blank line (two carriage returns \n). Each sentence of the article will be on its own line. See ./example_custom_dataset/ for an example.

Convert your data to TensorFlow examples that can be fed to the PG-MMR model.

$ python convert_data.py --dataset=example_custom_dataset --custom_dataset_path=./example_custom_dataset/

Run the testing script. The summary files are located in the ./logs/example_custom_dataset/ directory.
```
$ python run_summarization.py --dataset_name=example_custom_dataset --pg_mmr
```

License

This project is licensed under the BSD License - see the LICENSE.md file for details.

Acknowledgments

We gratefully acknowledge the work of Abigail See whose code was used as a basis for this project.

Name	Name	Last commit message	Last commit date
Latest commit loganlebanoff Added svr.pickle Apr 2, 2019 7023144 · Apr 2, 2019 History 24 Commits
example_custom_dataset	example_custom_dataset	Added readme and license. Can now use custom multi-document datasets.	Aug 25, 2018
logs	logs	Added svr.pickle	Apr 2, 2019
.gitignore	.gitignore	Fixed readme to include --pg_mmr flag	Mar 7, 2019
LICENSE.md	LICENSE.md	Added readme and license. Can now use custom multi-document datasets.	Aug 25, 2018
README.md	README.md	Fixed readme to include --pg_mmr flag	Mar 7, 2019
__init__.py	__init__.py	Removed old command-line arguments and functionality	Aug 23, 2018
attention_decoder.py	attention_decoder.py	Added readme and license. Can now use custom multi-document datasets.	Aug 25, 2018
batcher.py	batcher.py	Now can input custom multi-doc articles without summaries	Aug 25, 2018
beam_search.py	beam_search.py	Added readme and license. Can now use custom multi-document datasets.	Aug 25, 2018
convert_data.py	convert_data.py	Now can input custom multi-doc articles without summaries	Aug 25, 2018
data.py	data.py	Added readme and license. Can now use custom multi-document datasets.	Aug 25, 2018
decode.py	decode.py	Fixed readme to include --pg_mmr flag	Mar 7, 2019
importance_features.py	importance_features.py	Added svr.pickle	Apr 2, 2019
model.py	model.py	Added readme and license. Can now use custom multi-document datasets.	Aug 25, 2018
pg_mmr_functions.py	pg_mmr_functions.py	Added readme and license. Can now use custom multi-document datasets.	Aug 25, 2018
run_summarization.py	run_summarization.py	Added svr.pickle	Apr 2, 2019
util.py	util.py	Added readme and license. Can now use custom multi-document datasets.	Aug 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization

Goal

Dependencies

How to Generate Summaries

License

Acknowledgments

About

Releases

Packages

Languages

License

JPXIII/multidoc_summarization

Folders and files

Latest commit

History

Repository files navigation

Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization

Goal

Dependencies

How to Generate Summaries

License

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages