T. Kinnunen and H. Li, An overview of text-independent speaker recognition: From features to supervectors, Speech Communication, vol.52, issue.1, pp.12-40, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00587602

J. H. Hansen and T. Hasan, Speaker recognition by machines and humans: A tutorial review, IEEE Signal Processing Magazine, vol.32, issue.6, pp.74-99, 2015.

A. Poddar, M. Sahidullah, and G. Saha, Speaker verification with short utterances: a review of challenges, trends and opportunities, IET Biometrics, vol.7, issue.2, pp.91-101, 2017.

H. Zeinali, K. Lee, J. Alam, and L. Burget, Short-duration speaker verification (SdSV) challenge 2020: the challenge evaluation plan, pp.2020-2024

A. Larcher, K. A. Lee, B. Ma, and H. Li, Text-dependent speaker verification: Classifiers, databases and RSR2015, Speech Communication, vol.60, pp.56-77, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01926338

K. A. Lee, A. Larcher, G. Wang, P. Kenny, N. Brümmer et al., The RedDots data collection for speaker recognition, Proc. INTERSPEECH, pp.2996-3000, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01818427

A. K. Sarkar and Z. Tan, Text dependent speaker verification using un-supervised HMM-UBM and temporal GMM-UBM, Proc. INTERSPEECH, pp.425-429, 2016.

H. Zeinali, H. Sameti, and L. Burget, HMM-based phraseindependent i-vector extractor for text-dependent speaker verification, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue.7, pp.1421-1435, 2017.

H. Zeinali, L. Burget, H. Sameti, O. Glembek, and O. Plchot, Deep neural networks and Hidden markov models in i-vectorbased text-dependent speaker verification, Proc. Odyssey: The Speaker and Language Recognition Workshop, pp.24-30, 2016.

S. Dey, P. Motlicek, S. Madikeri, and M. Ferras, Templatematching for text-dependent speaker verification, Speech Communication, vol.88, pp.96-105, 2017.

Q. He, G. W. Wornell, and W. Ma, A low-power textdependent speaker verification system with narrow-band feature pre-selection and weighted dynamic time warping, Proc. Odyssey: The Speaker and Language Recognition Workshop, pp.1-8, 2016.

A. Sarkar and Z. Tan, Incorporating pass-phrase dependent background models for text-dependent speaker verification, Computer Speech & Language, vol.47, pp.259-271, 2018.

T. Kinnunen, M. Sahidullah, I. Kukanov, H. Delgado, M. Todisco et al., Utterance verification for text-dependent speaker recognition: A comparative assessment using the RedDots corpus, Proc. INTERSPEECH, pp.430-434, 2016.

D. Raj, D. Snyder, D. Povey, and S. Khudanpur, Probing the information encoded in x-vectors, Proc. 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp.726-733, 2019.

S. Wang, Y. Qian, and K. Yu, What does the speaker embedding encode, Proc. INTERSPEECH, pp.1497-1501, 2017.

N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, Front-end factor analysis for speaker verification, IEEE Transactions on Audio, Speech, and Language Processing, vol.19, issue.4, pp.788-798, 2010.

D. Snyder, D. Garcia-romero, G. Sell, D. Povey, and S. Khudanpur, X-vectors: Robust DNN embeddings for speaker recognition, Proc. 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.5329-5333, 2018.

D. Reynolds, T. F. Quatieri, and R. Dunn, Speaker verification using adapted Gaussian mixture models, Digital Signal Processing, vol.10, issue.1-3, pp.19-41, 2000.

J. Hu, L. Shen, and G. Sun, Squeeze-and-excitation networks, Proc. 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.7132-7141, 2018.

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.770-778, 2016.

H. Zhang, J. Xue, and K. Dana, Deep TEN: Texture encoding network, Proc. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR, pp.708-717, 2017.

W. Cai, Z. Cai, X. Zhang, X. Wang, and M. Li, A novel learnable dictionary encoding layer for end-to-end language identification, Proc. 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.5189-5193, 2018.

V. Vestman, K. A. Lee, and T. H. Kinnunen, Neural i-vectors, 2020.

D. Garcia-romero and C. Y. Espy-wilson, Analysis of i-vector length normalization in speaker recognition systems, Proc. IN-TERSPEECH, pp.249-252, 2011.

S. Cumani, P. D. Batzu, D. Colibro, C. Vair, P. Laface et al., Comparison of speaker recognition approaches for real applications, Proc. INTERSPEECH, pp.2365-2368, 2011.

M. Sahidullah and T. Kinnunen, Local spectral variability features for speaker verification, Digital Signal Processing, vol.50, pp.1-11, 2016.

S. Young, The HTK Book (for version 3.4), 2009.

A. K. Sarkar, Z. Tan, H. Tang, S. Shon, and J. R. Glass, Time-contrastive learning based deep bottleneck features for text-dependent speaker verification, Speech, and Language Processing, vol.27, pp.1267-1279, 2019.

A. Sarkar and Z. Tan, Time-contrastive learning based DNN bottleneck features for text-dependent speaker verification, Proc. NIPS Time Series Workshop, 2017.

H. Zeinali, H. Sameti, and T. Stafylakis, DeepMine speech processing database: Text-dependent and independent speaker verification and speech recognition in Persian and English, Proc. Odyssey: The Speaker and Language Recognition Workshop, pp.386-392, 2018.

H. Zeinali, L. Burget, and J. Cernocky, A multi purpose and large scale speech corpus in Persian and English for speaker and speech recognition: the DeepMine database, Proc. 2019 IEEE Automatic Speech Recognition and Understanding Workshop, pp.397-402, 2019.

A. Nagrani, J. S. Chung, and A. Zisserman, VoxCeleb: A largescale speaker identification dataset, Proc. INTERSPEECH, pp.2616-2620, 2017.

J. S. Chung, A. Nagrani, and A. Zisserman, VoxCeleb2: deep speaker recognition, Proc. INTERSPEECH, pp.1086-1090, 2018.

D. Povey, The Kaldi speech recognition toolkit, Proc. 2011 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2011.

M. Sahidullah and G. Saha, Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition, Speech Communication, vol.54, issue.4, pp.543-565, 2012.

A. , An introduction to computational networks and the computational network toolkit, 2016.

Z. Tan, A. K. Sarkar, and N. Dehak, rVAD: An unsupervised segment-based robust voice activity detection method, Computer Speech & Language, vol.59, pp.1-21, 2020.