C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed et al., Going deeper with convolutions, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
DOI : 10.1109/CVPR.2015.7298594

P. Matejka, L. Zhang, T. Ng, O. Hs-mallidi, J. Glembek et al., Neural network bottleneck features for language identification, Proc. IEEE Odyssey, pp.299-304, 2014.

L. Deng, J. Li, J. Huang, K. Yao, D. Yu et al., Recent advances in deep learning for speech research at Microsoft, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.8604-8608, 2013.
DOI : 10.1109/ICASSP.2013.6639345

F. Richardson, D. Reynolds, and N. Dehak, Deep Neural Network Approaches to Speaker and Language Recognition, IEEE Signal Processing Letters, vol.22, issue.10, p.1671, 2015.
DOI : 10.1109/LSP.2015.2420092

S. Ganapathy, K. Han, S. Thomas, M. Omar, M. Van-segbroeck et al., Robust language identification using convolutional neural network features, Proc. INTER- SPEECH, 2014.

L. Uzan and L. Wolf, I know that voice: Identifying the voice actor behind the voice, 2015 International Conference on Biometrics (ICB), pp.46-51, 2015.
DOI : 10.1109/ICB.2015.7139074

D. Palaz and R. Collobert, Analysis of cnnbased speech recognition system using raw speech as input, Proc. INTERSPEECH, 2015.

L. Deng, O. Abdel-hamid, and D. Yu, A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.6669-6673, 2013.
DOI : 10.1109/ICASSP.2013.6638952

H. Lee, P. Pham, Y. Largman, Y. Andrew, and . Ng, Unsupervised feature learning for audio classification using convolutional deep belief networks, Advances in neural information processing systems, pp.1096-1104, 2009.

A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos et al., Deepspeech: Scaling up end-to-end speech recognition, 2014.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems, pp.1097-1105, 2012.

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, 1409.

M. Mclaren, Y. Lei, N. Scheffer, and L. Ferrer, Application of convolutional neural networks to speaker recognition in noisy conditions, Proc. INTERSPEECH, 2014.

N. Anand and P. Verma, Convoluted feelings convolutional and recurrent nets for detecting emotion from audio data

A. Douglas, . Reynolds, F. Thomas, R. B. Quatieri, and . Dunn, Speaker verification using adapted gaussian mixture models, Digital signal processing, vol.10, issue.1, pp.19-41, 2000.

N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, Front-End Factor Analysis for Speaker Verification, Audio, Speech, and Language Processing, pp.788-798, 2011.
DOI : 10.1109/TASL.2010.2064307

P. Delacourt, J. Christian, and . Wellekens, DISTBIC: A speaker-based segmentation for audio data indexing, Speech Communication, vol.32, issue.1-2, pp.111-126, 2000.
DOI : 10.1016/S0167-6393(00)00027-3

M. Seyed-omid-sadjadi, L. Slaney, and . Heck, Msr identity toolbox v1. 0: A matlab toolbox for speaker recognition research, Speech and Language Processing Technical Committee Newsletter, 2013.

D. Garcia-romero and C. Y. Espy-wilson, Analysis of i-vector length normalization in speaker recognition systems, Proc. INTERSPEECH, pp.249-252, 2011.

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long et al., Caffe, Proceedings of the ACM International Conference on Multimedia, MM '14, 2014.
DOI : 10.1145/2647868.2654889

D. Matthew, R. Zeiler, and . Fergus, Visualizing and understanding convolutional networks, Computer vision? ECCV 2014, pp.818-833, 2014.

A. Giraudel, M. Carré, V. Mapelli, J. Kahn, O. Galibert et al., The repere corpus: a multimodal corpus for person recognition, LREC, pp.1102-1107, 2012.

J. Pelecanos and S. Sridharan, Feature warping for robust speaker verification IEEE Odyssey: The Speaker and Language Recognition Workshop, pp.213-218, 2001.

H. Aronowitz, Inter dataset variability compensation for speaker recognition, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4002-4006, 2014.
DOI : 10.1109/ICASSP.2014.6854353