Real to H-space Encoder for Speech Recognition

Titouan Parcollet; Mohamed Morchid; Georges Linarès; Renato de Mori

Communication Dans Un Congrès Année : 2019

Real to H-space Encoder for Speech Recognition

(1) , (1) , (1) , (1, 2)

1
2

Titouan Parcollet

Fonction : Auteur correspondant
PersonId : 174514
IdHAL : titouan-parcollet
ORCID : 0000-0003-0672-1346

Connectez-vous pour contacter l'auteur

Laboratoire Informatique d'Avignon

Mohamed Morchid

Fonction : Auteur
PersonId : 21451
IdHAL : morchid
ORCID : 0000-0002-4427-2468
IdRef : 188328343

Laboratoire Informatique d'Avignon

Georges Linarès

Fonction : Auteur
PersonId : 4977
IdHAL : georges-linares
IdRef : 079368794

Laboratoire Informatique d'Avignon

Renato de Mori

Fonction : Auteur

Laboratoire Informatique d'Avignon

McGill University = Université McGill [Montréal, Canada]

Résumé

Deep neural networks (DNNs) and more precisely recurrent neural networks (RNNs) are at the core of modern automatic speech recognition systems, due to their efficiency to process input sequences. Recently, it has been shown that different input representations, based on multidimensional algebras, such as complex and quaternion numbers, are able to bring to neural networks a more natural, compressive and powerful representation of the input signal by outperforming common real-valued NNs. Indeed, quaternion-valued neural networks (QNNs) better learn both internal dependencies, such as the relation between the Mel-filter-bank value of a specific time frame and its time derivatives, and global dependencies, describing the relations that exist between time frames. Nonetheless, QNNs are limited to quaternion-valued input signals, and it is difficult to benefit from this powerful representation with real-valued input data. This paper proposes to tackle this weakness by introducing a real-to-quaternion encoder that allows QNNs to process any one dimensional input features, such as traditional Mel-filter-banks for automatic speech recognition.

Mots clés

Index Terms: quaternion neural networks recurrent neural net- works speech recognition

Domaines

Informatique [cs] Intelligence artificielle [cs.AI]

Fichier principal

INTERSPEECH_2019___Projection_QLSTM-3.pdf (1.02 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Titouan Parcollet : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02158201

Soumis le : lundi 17 juin 2019-22:15:39

Dernière modification le : mercredi 3 novembre 2021-10:00:34

Dates et versions

hal-02158201 , version 1 (17-06-2019)

Identifiants

HAL Id : hal-02158201 , version 1

Citer

Titouan Parcollet, Mohamed Morchid, Georges Linarès, Renato de Mori. Real to H-space Encoder for Speech Recognition. INTERSPEECH 2019, Jun 2019, Graz, Austria. ⟨hal-02158201⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-AVIGNON LIA

40 Consultations

43 Téléchargements

Real to H-space Encoder for Speech Recognition

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager