COMPARING GRU AND LSTM FOR AUTOMATIC SPEECH RECOGNITION

Abstract : This paper proposes to compare Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM) for speech recognition acoustic models. While these recurrent models were mainly proposed for simple read speech tasks, we experiment on a large vocabulary continuous speech recognition task: transcription of TED talks. In addition to be simpler compared to LSTM, GRU networks outperform LSTM for all network depths experimented. We also propose a new model termed as DNN-BGRU-DNN. This model uses Deep Neural Network (DNN) followed by a Bidirectional GRU and another DNN. First DNN acts as a feature processor, BGRU is used to store temporal contextual information and final DNN introduces additional non-linearity. Our best model achieved 13.35% WER on TEDLIUM dataset which is a 16.66% & 17.84% relative improvement on baseline HMM-DNN and HMM-SGMM models respectively.
Type de document :
Rapport
[Research Report] LIG. 2016
Liste complète des métadonnées

Littérature citée [25 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01633254
Contributeur : Benjamin Lecouteux <>
Soumis le : dimanche 12 novembre 2017 - 08:02:09
Dernière modification le : jeudi 11 octobre 2018 - 08:48:03
Document(s) archivé(s) le : mardi 13 février 2018 - 12:29:25

Fichier

comparing-gru-lstm.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01633254, version 1

Collections

Citation

Shubham Khandelwal, Benjamin Lecouteux, Laurent Besacier. COMPARING GRU AND LSTM FOR AUTOMATIC SPEECH RECOGNITION. [Research Report] LIG. 2016. 〈hal-01633254〉

Partager

Métriques

Consultations de la notice

153

Téléchargements de fichiers

1455