Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE transactions on acoustics, speech, and signal processing, vol.28, pp.357-366, 1980. ,
An efficient auditory filterbank based on the gammatone function, vol.2, 1987. ,
Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol.86, issue.11, pp.2278-2324, 1998. ,
Imagenet classification with deep convolutional neural networks, NIPS, 2012. ,
Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks, 2013. ,
Speech acoustic modeling from raw multichannel waveforms, Proceedings of ICASSP, 2015. ,
Learning the speech front-end with raw waveform cldnns, 2015. ,
Attention-based wav2text with feature transfer learning, 2017. ,
, Network in network, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01950552
Phone recognition with hierarchical convolutional deep maxout networks, EURASIP Journal on Audio, Speech, and Music Processing, vol.2015, issue.1, p.25, 2015. ,
Achieving human parity in conversational speech recognition, 2016. ,
DOI : 10.1109/taslp.2017.2756440
Deep scattering spectrum, IEEE Transactions on Signal Processing, vol.62, issue.16, pp.4114-4128, 2014. ,
Deep scattering spectrum with deep neural networks, ICASSP, 2014. ,
DOI : 10.1109/icassp.2014.6853588
A deep scattering spectrumdeep siamese network pipeline for unsupervised acoustic modeling, ICASSP, 2016. ,
DOI : 10.1109/icassp.2016.7472622
End-to-end phoneme sequence recognition using convolutional neural networks, 2013. ,
Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu, Wavenet: A generative model for raw audio, 2016. ,
Towards end-to-end speech recognition with deep convolutional neural networks, 2017. ,
DOI : 10.21437/interspeech.2016-1446
URL : http://arxiv.org/pdf/1701.02720
Attention-based models for speech recognition, NIPS ,
Segmental recurrent neural networks for endto-end speech recognition, 2016. ,
DOI : 10.21437/interspeech.2016-40
URL : http://arxiv.org/pdf/1603.00223
Timit acoustic-phonetic continuous speech corpus, Linguistic data consortium, vol.10, issue.5, p.0, 1993. ,
Dropout: a simple way to prevent neural networks from overfitting, Journal of machine learning research, vol.15, issue.1, pp.1929-1958, 2014. ,
Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, CVPR, 2015. ,
DOI : 10.1109/iccv.2015.123
URL : http://arxiv.org/pdf/1502.01852
Wav2letter: an end-to-end convnet-based speech recognition system, 2016. ,
Efficient auditory coding, Nature, vol.439, issue.7079, pp.978-982, 2006. ,
Parametric coding of speech spectra, The Journal of the Acoustical Society of America, vol.68, issue.2, pp.412-419, 1980. ,
DOI : 10.1121/1.384752