G. E. Dahl, D. Yu, L. Deng, and A. Acero, Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition, Audio, Speech, and Language Processing, pp.30-42, 2012.
DOI : 10.1109/TASL.2011.2134090
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.227.8990

G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed et al., Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups, IEEE Signal Processing Magazine, vol.29, issue.6, pp.82-97, 2012.
DOI : 10.1109/MSP.2012.2205597

A. Graves and N. Jaitly, Towards end-to-end speech recognition with recurrent neural networks, Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp.1764-1772, 2014.

A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos et al., Deep speech: Scaling up end-to-end speech recognition, 2014.

D. Yu, M. L. Seltzer, J. Li, J. Huang, and F. Seide, Feature learning in deep neural networks-studies on speech recognition tasks, 2013.

A. Mohamed, G. Hinton, and G. Penn, Understanding how Deep Belief Networks perform acoustic modelling, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4273-4276, 2012.
DOI : 10.1109/ICASSP.2012.6288863
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.224.2314

L. Van-der-maaten and G. Hinton, Visualizing data using t-sne, Journal of Machine Learning Research, vol.9, pp.2579-2605, 2008.

T. Nagamine, M. L. Seltzer, and N. Mesgarani, Exploring how deep neural networks form phonemic categories, Sixteenth Annual Conference of the International Speech Communication Association, 2015.

Y. Lecun and Y. Bengio, Convolutional networks for images, speech, and time series The handbook of brain theory and neural networks, 1995.

O. Abdel-hamid, A. Mohamed, H. Jiang, and G. Penn, Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4277-4280, 2012.
DOI : 10.1109/ICASSP.2012.6288864
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.224.2749

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS10), 2010.

J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu et al., Theano: a CPU and GPU math expression compiler, Proc. of the Python for Scientific Computing Conference (SciPy), 2010.

J. Gauvain, L. Lamel, and M. Eskenazi, Design considerations and text selection for BREF, a large french read-speech corpus, Proc. ICSLP-90, pp.1097-2000, 1990.

J. Macqueen, Some methods for classification and analysis of multivariate observations, Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp.281-297, 1967.

I. S. Dhillon, Y. Guan, and B. Kulis, Kernel k-means, Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '04, pp.551-556, 2004.
DOI : 10.1145/1014052.1014118

A. Y. Ng, M. I. Jordan, and Y. Weiss, On spectral clustering: Analysis and an algorithm Advances in neural information processing systems, pp.849-856, 2002.

S. Mouysset, J. Noailles, and D. Ruiz, Using a Global Parameter for Gaussian Affinity Matrices in Spectral Clustering, High Performance Computing for Computational Science-VECPAR 2008, pp.378-390, 2008.
DOI : 10.1007/978-3-540-92859-1_34

]. R. Kannan, S. Vempala, and A. Vetta, On clusterings, Journal of the ACM, vol.51, issue.3, pp.497-515, 2004.
DOI : 10.1145/990308.990313

S. Mouysset, J. Noailles, D. Ruiz, and R. Guivarch, On a Strategy for Spectral Clustering with Parallel Computation, High Performance Computing for Computational Science?VECPAR 2010, pp.408-420, 2010.
DOI : 10.1137/1.9780898719604

S. Mallat, Understanding deep convolutional networks, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol.374, issue.2065, 2016.
DOI : 10.1214/14-AOS1276
URL : http://arxiv.org/abs/1601.04920

A. Radford, L. Metz, and S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks, 2015.

C. Gendrot and M. Adda-decker, Impact of duration on f1/f2 formant values of oral vowels: an automatic analysis of large broadcast news corpora in french and german, Variations, vol.25, issue.22, pp.2-4, 2005.
URL : https://hal.archives-ouvertes.fr/halshs-00188096