S. Davis and P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE transactions on acoustics, speech, and signal processing, vol.28, pp.357-366, 1980.

R. D. Patterson, I. Nimmo-smith, J. Holdsworth, and P. Rice, An efficient auditory filterbank based on the gammatone function, vol.2, 1987.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol.86, issue.11, pp.2278-2324, 1998.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, NIPS, 2012.

D. Palaz, R. Collobert, and M. Doss, Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks, 2013.

Y. Hoshen, R. J. Weiss, and K. W. Wilson, Speech acoustic modeling from raw multichannel waveforms, Proceedings of ICASSP, 2015.

N. Tara, R. J. Sainath, A. Weiss, K. W. Senior, O. Wilson et al., Learning the speech front-end with raw waveform cldnns, 2015.

A. Tjandra, S. Sakti, and S. Nakamura, Attention-based wav2text with feature transfer learning, 2017.

M. Lin, Q. Chen, and S. Yan, Network in network, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01950552

L. Tóth, Phone recognition with hierarchical convolutional deep maxout networks, EURASIP Journal on Audio, Speech, and Music Processing, vol.2015, issue.1, p.25, 2015.

W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer et al., Achieving human parity in conversational speech recognition, 2016.
DOI : 10.1109/taslp.2017.2756440

J. Andén and S. Mallat, Deep scattering spectrum, IEEE Transactions on Signal Processing, vol.62, issue.16, pp.4114-4128, 2014.

V. Peddinti, T. Sainath, S. Maymon, B. Ramabhadran, D. Nahamoo et al., Deep scattering spectrum with deep neural networks, ICASSP, 2014.
DOI : 10.1109/icassp.2014.6853588

N. Zeghidour, G. Synnaeve, M. Versteegh, and E. Dupoux, A deep scattering spectrumdeep siamese network pipeline for unsupervised acoustic modeling, ICASSP, 2016.
DOI : 10.1109/icassp.2016.7472622

D. Palaz, R. Collobert, and M. Doss, End-to-end phoneme sequence recognition using convolutional neural networks, 2013.

A. Van-den, S. Oord, H. Dieleman, K. Zen, O. Simonyan et al., Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu, Wavenet: A generative model for raw audio, 2016.

Y. Zhang, M. Pezeshki, P. Brakel, S. Zhang, C. Laurent-yoshua et al., Towards end-to-end speech recognition with deep convolutional neural networks, 2017.
DOI : 10.21437/interspeech.2016-1446
URL : http://arxiv.org/pdf/1701.02720

D. Jan-k-chorowski, D. Bahdanau, K. Serdyuk, Y. Cho, and . Bengio, Attention-based models for speech recognition, NIPS

L. Lu, L. Kong, C. Dyer, A. Noah, S. Smith et al., Segmental recurrent neural networks for endto-end speech recognition, 2016.
DOI : 10.21437/interspeech.2016-40
URL : http://arxiv.org/pdf/1603.00223

L. F. John-s-garofolo, . Lamel, M. William, J. G. Fisher, D. S. Fiscus et al., Timit acoustic-phonetic continuous speech corpus, Linguistic data consortium, vol.10, issue.5, p.0, 1993.

N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting, Journal of machine learning research, vol.15, issue.1, pp.1929-1958, 2014.

K. He, X. Zhang, S. Ren, and J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, CVPR, 2015.
DOI : 10.1109/iccv.2015.123
URL : http://arxiv.org/pdf/1502.01852

R. Collobert, C. Puhrsch, and G. Synnaeve, Wav2letter: an end-to-end convnet-based speech recognition system, 2016.

C. Evan, . Smith, and . Lewicki, Efficient auditory coding, Nature, vol.439, issue.7079, pp.978-982, 2006.

J. L. Flanagan, Parametric coding of speech spectra, The Journal of the Acoustical Society of America, vol.68, issue.2, pp.412-419, 1980.
DOI : 10.1121/1.384752