Speech Recognition with Quaternion Neural Networks

Titouan Parcollet; Mirco Ravanelli; Mohamed Morchid; Georges Linares; Renato de Mori

Communication Dans Un Congrès Année : 2018

Speech Recognition with Quaternion Neural Networks

(1) , (2) , (1) , (1) , (1, 3)

1
2
3

Titouan Parcollet

Fonction : Auteur
PersonId : 174514
IdHAL : titouan-parcollet
ORCID : 0000-0003-0672-1346

Laboratoire Informatique d'Avignon

Mirco Ravanelli

Fonction : Auteur

Montreal Institute for Learning Algorithms [Montréal]

Mohamed Morchid

Fonction : Auteur
PersonId : 21451
IdHAL : morchid
ORCID : 0000-0002-4427-2468
IdRef : 188328343

Laboratoire Informatique d'Avignon

Georges Linares

Fonction : Auteur
PersonId : 4977
IdHAL : georges-linares
IdRef : 079368794

Laboratoire Informatique d'Avignon

Renato de Mori

Fonction : Auteur

Laboratoire Informatique d'Avignon

McGill University = Université McGill [Montréal, Canada]

Résumé

Neural network architectures are at the core of powerful automatic speech recognition systems (ASR). However, while recent researches focus on novel model architectures, the acoustic input features remain almost unchanged. Traditional ASR systems rely on multidimensional acoustic features such as the Mel filter bank energies alongside with the first, and second order derivatives to characterize time-frames that compose the signal sequence. Considering that these components describe three different views of the same element, neural networks have to learn both the internal relations that exist within these features, and external or global dependencies that exist between the time-frames. Quaternion-valued neural networks (QNN), recently received an important interest from researchers to process and learn such relations in multidimensional spaces. Indeed, quaternion numbers and QNNs have shown their efficiency to process multidimensional inputs as entities, to encode internal dependencies, and to solve many tasks with up to four times less learning parameters than real-valued models. We propose to investigate modern quaternion-valued models such as convolutional and recurrent quaternion neural networks in the context of speech recognition with the TIMIT dataset. The experiments show that QNNs always outperform real-valued equivalent models with way less free parameters, leading to a more efficient, compact, and expressive representation of the relevant information.

Domaines

Informatique [cs] Intelligence artificielle [cs.AI]

Fichier principal

1811.09678.pdf (850.75 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Titouan Parcollet : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02107651

Soumis le : mardi 23 avril 2019-17:43:45

Dernière modification le : mercredi 3 novembre 2021-09:59:44

Dates et versions

hal-02107651 , version 1 (23-04-2019)

Identifiants

HAL Id : hal-02107651 , version 1
ARXIV : 1811.09678

Citer

Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Georges Linares, Renato de Mori. Speech Recognition with Quaternion Neural Networks. NIPS 2018 - IRASL Workshop, Dec 2018, Montréal, Canada. ⟨hal-02107651⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-AVIGNON LIA

39 Consultations

51 Téléchargements

Speech Recognition with Quaternion Neural Networks

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager