CGCNN: COMPLEX GABOR CONVOLUTIONAL NEURAL NETWORK ON RAW SPEECH

Paul-Gauthier Noé; Titouan Parcollet; Mohamed Morchid

Communication Dans Un Congrès Année : 2020

CGCNN: COMPLEX GABOR CONVOLUTIONAL NEURAL NETWORK ON RAW SPEECH

(1) , (1) , (1)

Paul-Gauthier Noé

Fonction : Auteur
PersonId : 182438
IdHAL : paul-gauthier-noe
ORCID : 0000-0002-2304-9830

Laboratoire Informatique d'Avignon

Titouan Parcollet

Fonction : Auteur
PersonId : 174514
IdHAL : titouan-parcollet
ORCID : 0000-0003-0672-1346

Laboratoire Informatique d'Avignon

Mohamed Morchid

Fonction : Auteur
PersonId : 21451
IdHAL : morchid
ORCID : 0000-0002-4427-2468
IdRef : 188328343

Laboratoire Informatique d'Avignon

Résumé

Convolutional Neural Networks (CNN) have been used in Automatic Speech Recognition (ASR) to learn representations directly from the raw signal instead of hand-crafted acoustic features, providing a richer and lossless input signal. Recent researches propose to inject prior acoustic knowledge to the first convolutional layer by integrating the shape of the impulse responses in order to increase both the interpretability of the learnt acoustic model, and its performances. We propose to combine the complex Gabor filter with complex-valued deep neural networks to replace usual CNN weights kernels, to fully take advantage of its optimal time-frequency resolution and of the complex domain. The conducted experiments on the TIMIT phoneme recognition task shows that the proposed approach reaches top-of-the-line performances while remaining interpretable.

Mots clés

SincNet complex neural networks Gabor filters speech recognition

Domaines

Traitement du signal et de l'image [eess.SP]

Fichier principal

gabor_complex_cnn_final.pdf (317.03 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Paul-Gauthier Noé : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02474746

Soumis le : mardi 11 février 2020-15:46:50

Dernière modification le : lundi 23 mai 2022-03:26:02

Archivage à long terme le : mardi 12 mai 2020-15:18:43

Dates et versions

hal-02474746 , version 1 (11-02-2020)

Identifiants

HAL Id : hal-02474746 , version 1

Citer

Paul-Gauthier Noé, Titouan Parcollet, Mohamed Morchid. CGCNN: COMPLEX GABOR CONVOLUTIONAL NEURAL NETWORK ON RAW SPEECH. ICASSP 2020, May 2020, Barcelona, Spain. ⟨hal-02474746⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-AVIGNON LIA

245 Consultations

232 Téléchargements

CGCNN: COMPLEX GABOR CONVOLUTIONAL NEURAL NETWORK ON RAW SPEECH

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager