This is an attempt to list interesting speaker recognition/identification/verification research works.
- Speaker Verification Using Adapted Gaussian Mixture Models, Reynolds et. al 2000 (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.117.338&rep=rep1&type=pdf)
- Front-end factor analysis for speaker verification, Dehak et. al 2010 (https://ieeexplore.ieee.org/document/5545402)
- Channel robust speaker verification via feature mapping, Reynolds 2003, ICASSP (https://ieeexplore.ieee.org/abstract/document/1202292/)
- Multi-Channel Speaker Verification for Single and Multi-talker Speech, Kataria et. al 2021 (https://arxiv.org/abs/2010.12692)
- Speaker recognition from raw waveform with sincnet, Ravanelli et. al 2018 (https://arxiv.org/abs/1808.00158)
- Graph Attention Networks for Speaker Verification, Jung et. al 2020 (https://arxiv.org/abs/2010.11543)
- Ferrer, Luciana, Mitchell McLaren, and Niko Brummer. "A Speaker Verification Backend with Robust Performance across Conditions." arXiv preprint arXiv:2102.01760 (2021). (https://arxiv.org/abs/2102.01760)
- Scoring of Large-Margin Embeddings for Speaker Verification: Cosine or PLDA?, Wang et. al 2022, Interspeech 2022 (https://www.isca-speech.org/archive/interspeech_2022/wang22r_interspeech.html)
- Ding, Shaojin, et al. "Autospeech: Neural architecture search for speaker recognition, Ding et. al 2020 (https://arxiv.org/abs/2005.03215)
- "Pushing the limits of raw waveform speaker recognition", Jee-weon Jung et. al 2022 (https://arxiv.org/abs/2203.08488)
- Exploring the encoding layer and loss function in end-to-end speaker and language recognition system, Cai et. al 2018 (https://arxiv.org/abs/1804.05160)
- Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition, Xiang et. al 2019 (https://arxiv.org/abs/1906.07317)
- Garcia-Romero, Daniel, Gregory Sell, and Alan McCree. "Magneto: X-vector magnitude estimation network plus offset for improved speaker recognition." Proc. Odyssey 2020 The Speaker and Language Recognition Workshop. 2020. (https://www.isca-speech.org/archive/Odyssey_2020/pdfs/65.pdf)
- Deep Speaker: an End-to-End Neural Speaker Embedding System, Li et. al 2017 (https://arxiv.org/abs/1705.02304)
- Learning Speaker Embedding with Momentum Contrast, Ding et. al 2020 (https://arxiv.org/abs/2001.01986)
- WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing, Chen et al. 2021 (https://arxiv.org/abs/2110.13900)
- VoiceID Loss: Speech Enhancement for Speaker Verification, Shon et. al 2019 (https://arxiv.org/abs/1904.03601)
- Feature enhancement with deep feature losses for speaker verification, Kataria et. al 2019 (https://arxiv.org/abs/1910.11905)
- Cycle-gans for domain adaptation of acoustic features for speaker recognition, Nidadavolu et. al 2019 (https://ieeexplore.ieee.org/document/8683055)
- A Study of Multimodal Person Verification Using Audio-Visual-Thermal Data, Abdrakhmanova et. al 2021 (https://arxiv.org/abs/2110.12136)
- Face-Mic: inferring live speech and speaker identity via subtle facial dynamics captured by AR/VR motion sensors, Shi et al. 2021 (https://dl.acm.org/doi/abs/10.1145/3447993.3483272)
- The bosaris toolkit: Theory, algorithms and code for surviving the new dcf, Brummer et al., 2013 (https://arxiv.org/abs/1304.2865)
- Beijing ZKJ-NPU Speaker Verification System for VoxCeleb Speaker Recognition Challenge 2021, Zhang et al., 2021 (https://arxiv.org/abs/2109.03568)
- Fan, Yue, et al. "CN-CELEB: a challenging Chinese speaker recognition dataset." ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020. (https://ieeexplore.ieee.org/abstract/document/9054017)
- Villalba, J. Advances on speaker recognition in non collaborative environments. Diss. Ph. D. dissertation, University of Zaragoza, 2014.
- Brummer, Niko. Measuring, refining and calibrating speaker and language information extracted from speech. Diss. Stellenbosch: University of Stellenbosch, 2010. (http://scholar.sun.ac.za/handle/10019.1/5139)
- Mak, Man-Wai, and Jen-Tzung Chien. Machine learning for speaker recognition. Cambridge University Press, 2020. (http://www.eie.polyu.edu.hk/~mwmak/papers/spkver-book_toc.pdf)
- Hyperion, Villalba et al., 2019 (https://github.com/jsalt2019-diadet/hyperion/tree/14a11436d62f3c15cd9b1f70bcce3eafbea2f753)
- SpeechBrain, Ravanelli et al., 2021 (https://github.com/speechbrain/speechbrain)
- Angular Prototypical Loss, Chung et al. 2020 (https://arxiv.org/abs/2003.11982)
- BOSARIS, multiple versions, (https://github.com/bsxfan/PYLLR, https://projets-lium.univ-lemans.fr/sidekit/api/bosaris/index.html, https://gitlab.eurecom.fr/nautsch/pybosaris)