A synthetic corpus of dialogs was constructed from the LibriSpeech corpus, and is made freely available for diarization research. It includes over 90 hours of training data, and over 9 hours each of development and test data. Both 2-person and 3-person dialogs, with and without overlap, are included. Timing information is provided in several formats, and includes not only speaker segmentations, but also phoneme segmentations. As such, it is a useful starting point for general, particularly early-stage, diarization system development.
The corpus contains 4 top-level directories:
librispeech2: 2-person dialogs
librispeech2o: 2-person dialogs with overlap
librispeech3: 3-person dialogs
librispeech3o: 3-person dialogs with overlap
All sub-directories are "Kaldi table" data directories.
Audio files are 16kHz PCM 16bit little-endian mono encoded.
ctm - each line is F C BT DUR word
Where:
F The waveform filename. NOTE: no pathnames or extensions are expected.
C Speaker.
BT The begin time (seconds) of the segment, measured from the start time of the file.
DUR The duration (seconds) of the segment.
labs - each line is a speaker id or 0 for pauses. One line corresponds 0.01 seconds of audio.
rttm0 - Rich Transcription Time Marked file format. Full specification can be found in Appendix A of "NIST's The 2009 (RT-09) Rich Transcription Meeting Recognition Evaluation Plan" paper.
rttm - merged rttm0, without pauses
This corpus is licensed under CC BY 4.0, but requires the following reference:
Edwards, E., Brenndoerfer, M., Robinson, A., Sadoughi, N., Finley, G. P., Korenevsky, M., Axtmann, N. & Suendermann-Oeft, D. (2018, September). A Free Synthetic Corpus for Speaker Diarization Research. In International Conference on Speech and Computer (pp. 113-122). Springer, Cham.
@inproceedings{edwards2018free,
title={A Free Synthetic Corpus for Speaker Diarization Research},
author={Edwards, Erik and Brenndoerfer, Michael and Robinson, Amanda and Sadoughi, Najmeh and Finley, Greg P and Korenevsky, Maxim and Axtmann, Nico and Miller, Mark and Suendermann-Oeft, David},
booktitle={International Conference on Speech and Computer},
pages={113--122},
year={2018},
organization={Springer}
}
Based on the LibriSpeech ASR corpus