This repository contains scripts and instructions for downloading the 20 Minuten ("20 Minutes") dataset.
This section describes downloading the dataset described in the paper "A New Dataset and Efficient Baselines for Document-level Text Simplification in German", presented at the Third Workshop on New Frontiers in Summarization at EMNLP 2021.
To download and extract the data, use the provided shell script:
bash data/2021_EMNLP_newsum/download_2021_EMNLP_newsum.sh
You can also download the data directly from here.