Learning Filterbanks from Raw Speech for Phoneme Recognition

Neil Zeghidour; Nicolas Usunier; Iasonas Kokkinos; Thomas Schatz; Gabriel Synnaeve; Emmanuel Dupoux

Communication Dans Un Congrès Année : 2018

Learning Filterbanks from Raw Speech for Phoneme Recognition

(1, 2) , (3) , (4) , (5) , (6) , (5, 2)

1
2
3
4
5
6

Neil Zeghidour

Fonction : Auteur

Université Paris-Saclay

Apprentissage machine et développement cognitif

Nicolas Usunier

Fonction : Auteur
PersonId : 948794

Facebook AI Research [Paris]

Iasonas Kokkinos

Fonction : Auteur
PersonId : 865671

Organ Modeling through Extraction, Representation and Understanding of Medical Image Content

Thomas Schatz

Fonction : Auteur

Laboratoire de sciences cognitives et psycholinguistique

Gabriel Synnaeve

Fonction : Auteur
PersonId : 883104

Laboratoire d'Informatique de Grenoble

Emmanuel Dupoux

Fonction : Auteur
PersonId : 757939
ORCID : 0000-0002-7814-2952

Laboratoire de sciences cognitives et psycholinguistique

Apprentissage machine et développement cognitif

Résumé

We train a bank of complex filters that operates on the raw waveform and is fed into a convolutional neural network for end-to-end phone recognition. These time-domain filterbanks (TD-filterbanks) are initialized as an approximation of mel-filterbanks, and then fine-tuned jointly with the remaining convolutional architecture. We perform phone recognition experiments on TIMIT and show that for several architectures, models trained on TD-filterbanks consistently outperform their counterparts trained on comparable mel-filterbanks. We get our best performance by learning all front-end steps, from pre-emphasis up to averaging. Finally, we observe that the filters at convergence have an asymmetric impulse response, and that some of them remain almost analytic.

Domaines

Informatique et langage [cs.CL] Sciences cognitives Linguistique Son [cs.SD]

Fichier principal

1711.01161.pdf (754.74 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Emmanuel Dupoux : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01888737

Soumis le : vendredi 7 décembre 2018-14:42:43

Dernière modification le : vendredi 19 avril 2024-16:18:55

Archivage à long terme le : vendredi 8 mars 2019-14:30:20

Dates et versions

hal-01888737 , version 1 (07-12-2018)

Identifiants

HAL Id : hal-01888737 , version 1
ARXIV : 1711.01161v2

Citer

Neil Zeghidour, Nicolas Usunier, Iasonas Kokkinos, Thomas Schatz, Gabriel Synnaeve, et al.. Learning Filterbanks from Raw Speech for Phoneme Recognition. ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing, Apr 2018, Calgary, Alberta, Canada. ⟨hal-01888737⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS UGA CNRS INRIA EHESS LIG LSCP DEC CVN CENTRALESUPELEC INRIA2 PSL UNIV-PARIS-SACLAY GS-ENGINEERING GS-COMPUTER-SCIENCE LIG_SIDCH

222 Consultations

469 Téléchargements

Learning Filterbanks from Raw Speech for Phoneme Recognition

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager