COMPARING GRU AND LSTM FOR AUTOMATIC SPEECH RECOGNITION

Abstract : This paper proposes to compare Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM) for speech recognition acoustic models. While these recurrent models were mainly proposed for simple read speech tasks, we experiment on a large vocabulary continuous speech recognition task: transcription of TED talks. In addition to be simpler compared to LSTM, GRU networks outperform LSTM for all network depths experimented. We also propose a new model termed as DNN-BGRU-DNN. This model uses Deep Neural Network (DNN) followed by a Bidirectional GRU and another DNN. First DNN acts as a feature processor, BGRU is used to store temporal contextual information and final DNN introduces additional non-linearity. Our best model achieved 13.35% WER on TEDLIUM dataset which is a 16.66% & 17.84% relative improvement on baseline HMM-DNN and HMM-SGMM models respectively.
Document type :
Reports
Liste complète des métadonnées

Cited literature [25 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01633254
Contributor : Benjamin Lecouteux <>
Submitted on : Sunday, November 12, 2017 - 8:02:09 AM
Last modification on : Tuesday, February 12, 2019 - 1:31:31 AM
Document(s) archivé(s) le : Tuesday, February 13, 2018 - 12:29:25 PM

File

comparing-gru-lstm.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01633254, version 1

Collections

Citation

Shubham Khandelwal, Benjamin Lecouteux, Laurent Besacier. COMPARING GRU AND LSTM FOR AUTOMATIC SPEECH RECOGNITION. [Research Report] LIG. 2016. ⟨hal-01633254⟩

Share

Metrics

Record views

181

Files downloads

1883