Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition

Titouan Parcollet; Ying Zhang; Mohamed Morchid; Chiheb Trabelsi; Georges Linarès; Renato de Mori; Yoshua Bengio

doi:10.21437/Interspeech.2018-1898

Communication Dans Un Congrès Année : 2018

Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition

(1) , (2) , (1) , (2) , (1) , (1) , (2)

1
2

Titouan Parcollet

Fonction : Auteur
PersonId : 174514
IdHAL : titouan-parcollet
ORCID : 0000-0003-0672-1346

Laboratoire Informatique d'Avignon

Ying Zhang

Fonction : Auteur
PersonId : 1045971

Montreal Institute for Learning Algorithms [Montréal]

Mohamed Morchid

Fonction : Auteur
PersonId : 21451
IdHAL : morchid
ORCID : 0000-0002-4427-2468
IdRef : 188328343

Laboratoire Informatique d'Avignon

Chiheb Trabelsi

Fonction : Auteur
PersonId : 1045972

Montreal Institute for Learning Algorithms [Montréal]

Georges Linarès

Fonction : Auteur
PersonId : 4977
IdHAL : georges-linares
IdRef : 079368794

Laboratoire Informatique d'Avignon

Renato de Mori

Fonction : Auteur
PersonId : 981954

Laboratoire Informatique d'Avignon

Yoshua Bengio

Fonction : Auteur

Montreal Institute for Learning Algorithms [Montréal]

Résumé

Recently, the connectionist temporal classification (CTC) model coupled with recurrent (RNN) or convolutional neural networks (CNN), made it easier to train speech recognition systems in an end-to-end fashion. However in real-valued models , time frame components such as mel-filter-bank energies and the cepstral coefficients obtained from them, together with their first and second order derivatives, are processed as individual elements, while a natural alternative is to process such components as composed entities. We propose to group such elements in the form of quaternions and to process these quaternions using the established quaternion algebra. Quaternion numbers and quaternion neural networks have shown their efficiency to process multidimensional inputs as entities, to encode internal dependencies , and to solve many tasks with less learning parameters than real-valued models. This paper proposes to integrate multiple feature views in quaternion-valued convolutional neu-ral network (QCNN), to be used for sequence-to-sequence mapping with the CTC model. Promising results are reported using simple QCNNs in phoneme recognition experiments with the TIMIT corpus. More precisely, QCNNs obtain a lower phoneme error rate (PER) with less learning parameters than a competing model based on real-valued CNNs.

Mots clés

Index Terms: quaternion convolutional neural networks auto- matic speech recognition deep learning

Domaines

Informatique [cs] Intelligence artificielle [cs.AI]

Fichier principal

1898.pdf (534.94 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Titouan Parcollet : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02107611

Soumis le : mardi 23 avril 2019-17:07:49

Dernière modification le : mercredi 3 novembre 2021-10:00:34

Dates et versions

hal-02107611 , version 1 (23-04-2019)

Identifiants

HAL Id : hal-02107611 , version 1
DOI : 10.21437/Interspeech.2018-1898

Citer

Titouan Parcollet, Ying Zhang, Mohamed Morchid, Chiheb Trabelsi, Georges Linarès, et al.. Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition. Interspeech 2018, Sep 2018, HYDERABAD, India. pp.22-26, ⟨10.21437/Interspeech.2018-1898⟩. ⟨hal-02107611⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-AVIGNON LIA

44 Consultations

361 Téléchargements

Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager