Skip to content

Code for the EMNLP 2018 paper "Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization"

License

Notifications You must be signed in to change notification settings

JPXIII/multidoc_summarization

This branch is 4 commits behind ucfnlp/multidoc_summarization:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

7023144 · Apr 2, 2019

History

24 Commits
Aug 25, 2018
Apr 2, 2019
Mar 7, 2019
Aug 25, 2018
Mar 7, 2019
Aug 23, 2018
Aug 25, 2018
Aug 25, 2018
Aug 25, 2018
Aug 25, 2018
Aug 25, 2018
Mar 7, 2019
Apr 2, 2019
Aug 25, 2018
Aug 25, 2018
Apr 2, 2019
Aug 25, 2018

Repository files navigation

Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization

We provide the source code for the paper "Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization", accepted at EMNLP'18. If you find the code useful, please cite the following paper.

@inproceedings{lebanoff-song-liu:2018,
 Author = {Logan Lebanoff and Kaiqiang Song and Fei Liu},
 Title = {Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization},
 Booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
 Year = {2018}}

Goal

  • Our system seeks to summarize a set of articles (about 10) about the same topic.

  • The code takes as input a text file containing a set of articles. See below on the input format of the files.

Dependencies

The code is written in Python (v2.7) and TensorFlow (v1.4.1). We suggest the following environment:

How to Generate Summaries

  1. Clone this repo. Download this ZIP file containing the pretrained model from See et al. Move the folder pretrained_model_tf1.2.1 into the ./logs/ directory.

    $ git clone https://github.com/ucfnlp/multidoc_summarization/
    $ mv pretrained_model_tf1.2.1.zip multidoc_summarization/logs
    $ cd multidoc_summarization/logs
    $ unzip pretrained_model_tf1.2.1.zip
    $ rm pretrained_model_tf1.2.1.zip
    $ cd ..
    
  2. Format your data in the following way:

    One file for each topic. Distinct articles will be separated by one blank line (two carriage returns \n). Each sentence of the article will be on its own line. See ./example_custom_dataset/ for an example.

  3. Convert your data to TensorFlow examples that can be fed to the PG-MMR model.

    $ python convert_data.py --dataset=example_custom_dataset --custom_dataset_path=./example_custom_dataset/
    
  4. Run the testing script. The summary files are located in the ./logs/example_custom_dataset/ directory.

    $ python run_summarization.py --dataset_name=example_custom_dataset --pg_mmr
    

License

This project is licensed under the BSD License - see the LICENSE.md file for details.

Acknowledgments

We gratefully acknowledge the work of Abigail See whose code was used as a basis for this project.

About

Code for the EMNLP 2018 paper "Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%