Quaternion Neural Networks for Multi-Channel Distant Speech Recognition - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

Quaternion Neural Networks for Multi-Channel Distant Speech Recognition

Xinchi Qiu
  • Fonction : Auteur
  • PersonId : 1129711
Titouan Parcollet

Résumé

Despite the significant progress in automatic speech recognition (ASR), distant ASR remains challenging due to noise and reverberation. A common approach to mitigate this issue consists of equipping the recording devices with multiple microphones that capture the acoustic scene from different perspectives. These multi-channel audio recordings contain specific internal relations between each signal. In this paper, we propose to capture these inter-and intra-structural dependencies with quaternion neural networks, which can jointly process multiple signals as whole quaternion entities. The quaternion algebra replaces the standard dot product with the Hamilton one, thus offering a simple and elegant way to model dependencies between elements. The quaternion layers are then coupled with a recurrent neural network, which can learn long-term dependencies in the time domain. We show that a quaternion longshort term memory neural network (QLSTM), trained on the concatenated multi-channel speech signals, outperforms equivalent real-valued LSTM on two different tasks of multi-channel distant speech recognition.

Dates et versions

hal-03601248 , version 1 (14-03-2022)

Identifiants

Citer

Xinchi Qiu, Titouan Parcollet, Mirco Ravanelli, Nicholas Lane, Mohamed Morchid. Quaternion Neural Networks for Multi-Channel Distant Speech Recognition. Interspeech 2020, Oct 2020, Shangai, China. pp.329-333, ⟨10.21437/interspeech.2020-1682⟩. ⟨hal-03601248⟩
8 Consultations
1 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More