COMPARING GRU AND LSTM FOR AUTOMATIC SPEECH RECOGNITION

This paper proposes to compare Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM) for speech recognition acoustic models. While these recurrent models were mainly proposed for simple read speech tasks, we experiment on a large vocabulary continuous speech recognition task: transcription of TED talks. In addition to be simpler compared to LSTM, GRU networks outperform LSTM for all network depths experimented. We also propose a new model termed as DNN-BGRU-DNN. This model uses Deep Neural Network (DNN) followed by a Bidirectional GRU and another DNN. First DNN acts as a feature processor, BGRU is used to store temporal contextual information and final DNN introduces additional non-linearity. Our best model achieved 13.35% WER on TEDLIUM dataset which is a 16.66% & 17.84% relative improvement on baseline HMM-DNN and HMM-SGMM models respectively.

Mots clés

Speech Recognition Acoustic Models LSTM GRU RNN

Domaines

Informatique et langage [cs.CL]

Fichier principal

comparing-gru-lstm.pdf (162.53 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Benjamin Lecouteux : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01633254

Soumis le : dimanche 12 novembre 2017-08:02:09

Dernière modification le : jeudi 4 avril 2024-18:26:19

Archivage à long terme le : mardi 13 février 2018-12:29:25

Dates et versions

hal-01633254 , version 1 (12-11-2017)

Identifiants

HAL Id : hal-01633254 , version 1

Citer

Shubham Khandelwal, Benjamin Lecouteux, Laurent Besacier. COMPARING GRU AND LSTM FOR AUTOMATIC SPEECH RECOGNITION. [Research Report] LIG. 2016. ⟨hal-01633254⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS LIG LIG_TDCGE_GETALP LARA POLYTECH-GRENOBLE LIG_SIDCH

1593 Consultations

4250 Téléchargements