The kaldi speech recognition toolkit, IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, p.11, 2011. ,
The pytorch-kaldi speech recognition toolkit, Proc. of ICASSP, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-02107617
Towards end-to-end speech recognition with recurrent neural networks, International Conference on Machine Learning, pp.1764-1772, 2014. ,
Towards end-to-end speech recognition with deep convolutional neural networks, 2017. ,
Joint ctcattention based end-to-end speech recognition using multi-task learning, 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp.4835-4839, 2017. ,
Espresso: A fast endto-end neural speech recognition toolkit, 2019. ,
End-to-end speech recognition from the raw waveform, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01888739
Jasper: An end-to-end convolutional neural acoustic model, 2019. ,
End-to-end attention-based large vocabulary speech recognition, 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp.4945-4949, 2016. ,
Advances in joint ctc-attention based end-to-end speech recognition with a deep cnn encoder and rnn-lm, 2017. ,
Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.5884-5888, 2018. ,
Selfattention networks for connectionist temporal classification in speech recognition, ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.7115-7119, 2019. ,
End-to-end phoneme sequence recognition using convolutional neural networks, 2013. ,
Acoustic modeling with deep neural networks using raw time signal for lvcsr, Fifteenth annual conference of the international, 2014. ,
Speech acoustic modeling from raw multichannel waveforms, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4624-4628, 2015. ,
, Speaker recognition from raw waveform with sincnet, 2018.
Speech and speaker recognition from raw waveform with sincnet, 2018. ,
Espnet: End-to-end speech processing toolkit, 2018. ,
, Theory and applications of digital speech processing, vol.64, 2011.
Digital signal processing: a computer-based approach, vol.2, 2006. ,
On learning interpretable cnns with parametric modulated kernel-based filters, Proc. Interspeech, pp.3480-3484, 2019. ,
, Interpretable convolutional filters with sincnet, 2018.
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, Proceedings of the 23rd international conference on Machine learning, pp.369-376, 2006. ,
Effective approaches to attention-based neural machine translation, 2015. ,
Attention-based models for speech recognition, Advances in neural information processing systems, pp.577-585, 2015. ,
Darpa timit acoustic-phonetic continous speech corpus cd-rom. nist speech disc 1-1.1, NASA STI/Recon technical report n, vol.93, 1993. ,
Adadelta: an adaptive learning rate method, 2012. ,
Attention-based wav2text with feature transfer learning, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp.309-315, 2017. ,
Improving transformer-based end-to-end speech recognition with connectionist temporal classification and language model integration, 2019. ,