Skip to Main content Skip to Navigation
Conference papers

Evaluation of Feature-Space Speaker Adaptation for End-to-End Acoustic Models

Abstract : This paper investigates speaker adaptation techniques for bidirectional long short term memory (BLSTM) recurrent neural network based acoustic models (AMs) trained with the connectionist temporal classification (CTC) objective function. BLSTM-CTC AMs play an important role in end-to-end automatic speech recognition systems. However, there is a lack of research in speaker adaptation algorithms for these models. We explore three different feature-space adaptation approaches for CTC AMs: feature-space maximum linear regression, i-vector based adaptation, and maximum a posteriori adaptation using GMM-derived features. Experimental results on the TED-LIUM corpus demonstrate that speaker adaptation, applied in combination with data augmentation techniques, provides, in an unsupervised adaptation mode, for different test sets, up to 11--20% of relative word error rate reduction over the baseline model built on the raw filter-bank features. In addition, the adaptation behavior is compared for BLSTM-CTC AMs and time-delay neural network AMs trained with the cross-entropy criterion.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01728526
Contributor : Yannick Estève <>
Submitted on : Monday, March 26, 2018 - 10:37:12 AM
Last modification on : Thursday, March 29, 2018 - 1:01:00 AM

Files

main.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01728526, version 1

Collections

Citation

Natalia Tomashenko, Yannick Estève. Evaluation of Feature-Space Speaker Adaptation for End-to-End Acoustic Models. LREC 2018, May 2018, Miyazaki, Japan. ⟨hal-01728526⟩

Share

Metrics

Record views

395

Files downloads

612