Newsela Split #15

chrhad · 2018-08-03T09:39:13Z

Hi,

I would like to clarify regarding the Newsela data setup:

Am I right that the originally released data in 2015 (1,130 articles) was used in the paper? (That is, the text file in “newsela_data_share-20150302” in the Newsela release)
Following the description in Section 5 by having the first 1,070 articles for training, the next 30 for development, and the next 30 for testing, followed by filtering out sentence pairs corresponding to alignment levels 0-1, 1-2, and 2-3 gave me numbers of sentence pairs that are different from the paper (94,944 training, 2,531 development, and 2,462 test sentences). How can I come up with 94,208 training sentence pairs, 1,129 development sentence pairs, and 1,076 test sentence pairs as stipulated in the paper?

Thank you.

Regards,
Christian

saraswat · 2018-08-04T23:02:33Z

Where will I find the newsela release?

Provide feedback