Learning Filterbanks from Raw Speech for Phoneme Recognition

Abstract : We train a bank of complex filters that operates on the raw waveform and is fed into a convolutional neural network for end-to-end phone recognition. These time-domain filterbanks (TD-filterbanks) are initialized as an approximation of mel-filterbanks, and then fine-tuned jointly with the remaining convolutional architecture. We perform phone recognition experiments on TIMIT and show that for several architectures, models trained on TD-filterbanks consistently outperform their counterparts trained on comparable mel-filterbanks. We get our best performance by learning all front-end steps, from pre-emphasis up to averaging. Finally, we observe that the filters at convergence have an asymmetric impulse response, and that some of them remain almost analytic.
Type de document :
Communication dans un congrès
ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing, Apr 2018, Calgary, Alberta, Canada. Proceedings of ICASSP 2018
Liste complète des métadonnées

Littérature citée [15 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01888737
Contributeur : Emmanuel Dupoux <>
Soumis le : vendredi 7 décembre 2018 - 14:42:43
Dernière modification le : jeudi 7 février 2019 - 16:58:41

Fichier

1711.01161.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Neil Zeghidour, Nicolas Usunier, Iasonas Kokkinos, Thomas Schatz, Gabriel Synnaeve, et al.. Learning Filterbanks from Raw Speech for Phoneme Recognition. ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing, Apr 2018, Calgary, Alberta, Canada. Proceedings of ICASSP 2018. 〈hal-01888737〉

Partager

Métriques

Consultations de la notice

103

Téléchargements de fichiers

21