E. C. Battenberg, . J. Case, B. Casper, J. Catanzaro, M. Chen et al., Deep speech 2: End-to-end speech recognition in english and mandarin, 2015.

M. Ravanelli, T. Parcollet, and Y. Bengio, The pytorch-kaldi speech recognition toolkit, ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.6465-6469, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02107617

S. Watanabe, T. Hori, S. Karita, T. Hayashi, J. Nishitoba et al., Espnet: End-to-end speech processing toolkit, 2018.

J. Li, V. Lavrukhin, B. Ginsburg, R. Leary, J. Kuchaiev et al., Jasper: An end-to-end convolutional neural acoustic model, 2019.

D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek et al., The kaldi speech recognition toolkit, IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, p.11, 2011.

M. Ravanelli, P. Brakel, M. Omologo, and Y. Bengio, Improving speech recognition by revising gated recurrent units, Proc. Interspeech, pp.1308-1312, 2017.

A. Graves and N. Jaitly, Towards end-to-end speech recognition with recurrent neural networks, International Conference on Machine Learning, pp.1764-1772, 2014.

Y. Zhang, P. Pezeshki, S. Brakel, C. Zhang, Y. Laurent et al., Towards end-to-end speech recognition with deep convolutional neural networks, 2017.

S. Kim, T. Hori, and S. Watanabe, Joint ctc-attention based end-to-end speech recognition using multi-task learning, 2017 IEEE international conference on acoustics, speech and signal processing

, IEEE, pp.4835-4839, 2017.

D. Palaz, R. Collobert, and M. Doss, End-to-end phoneme sequence recognition using convolutional neural networks, 2013.

Z. Tüske, P. Golik, R. Schlüter, and H. Ney, Acoustic modeling with deep neural networks using raw time signal for lvcsr, Fifteenth annual conference of the international speech communication association, pp.890-894, 2014.

Y. Hoshen, R. Weiss, and K. Wilson, Speech acoustic modeling from raw multichannel waveforms, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4624-4628, 2015.

N. Zeghidour, N. Usunier, G. Synnaeve, R. Collobert, and E. Dupoux, End-to-end speech recognition from the raw waveform, pp.781-785, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01888739

M. Ravanelli and Y. Bengio, Speaker recognition from raw waveform with sincnet, Proc. of Spoken Language Technology Workshop (SLT), pp.1021-1028, 2018.

M. Ravanelli and Y. Bengio, Speech and speaker recognition from raw waveform with sincnet, 2018.

E. Loweimi, P. Bell, and S. Renals, On learning interpretable cnns with parametric modulated kernel-based filters, Proc. of Interspeech. ISCA, pp.3480-3484, 2019.

D. Gabor, Theory of communication, Journal of the Institute of Electrical Engineers, vol.93, pp.429-457, 1946.

S. Robertson, G. Penn, and Y. Wang, Improving speech recognition with drop-in replacements for f-bank features, Proc. of SLSP, pp.210-222, 2019.

I. , K. T. Schatz, G. Synnaeve, E. Dupoux, N. Zeghidour et al., Learning filterbanks from raw speech for phoneme recognition, Proc. of ICASSP, pp.5509-5513, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01888737

C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian et al., Deep complex networks, Proc. of ICLR 2018, 2018.

J. A. Huh, J. W. Kim, . K. Ha, H. S. Lee, J. H. Choi et al., Phase-aware speech enhancement with deep complex u-net, Proc. of ICLR 2019, 2019.

S. Mallat, A Wavelet Tour of Signal Processing: The Sparse Way, 2008.

G. Sommer, Geometric Computing with Clifford Algebras, 2001.

J. S. Garofolo, L. F. Lamel, and W. M. Fisher, Darpa timit acoustic-phonetic continous speech corpus cdrom. nist speech disc 1-1.1, NASA STI/Recon technical report, vol.93, 1993.

V. Nair and G. Hinton, Rectified linear units improve restricted boltzmann machines, Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp.807-814, 2010.

P. Bell and S. Renals, Regularization of contextdependent deep neural networks with contextindependent multi-task training, Proc. of ICCASP, pp.4290-4294, 2015.