Evaluation of Feature-Space Speaker Adaptation for End-to-End Acoustic Models

Abstract : This paper investigates speaker adaptation techniques for bidirectional long short term memory (BLSTM) recurrent neural network based acoustic models (AMs) trained with the connectionist temporal classification (CTC) objective function. BLSTM-CTC AMs play an important role in end-to-end automatic speech recognition systems. However, there is a lack of research in speaker adaptation algorithms for these models. We explore three different feature-space adaptation approaches for CTC AMs: feature-space maximum linear regression, i-vector based adaptation, and maximum a posteriori adaptation using GMM-derived features. Experimental results on the TED-LIUM corpus demonstrate that speaker adaptation, applied in combination with data augmentation techniques, provides, in an unsupervised adaptation mode, for different test sets, up to 11--20% of relative word error rate reduction over the baseline model built on the raw filter-bank features. In addition, the adaptation behavior is compared for BLSTM-CTC AMs and time-delay neural network AMs trained with the cross-entropy criterion.
Type de document :
Communication dans un congrès
LREC 2018, May 2018, Miyazaki, Japan
Liste complète des métadonnées

Contributeur : Yannick Estève <>
Soumis le : lundi 26 mars 2018 - 10:37:12
Dernière modification le : jeudi 29 mars 2018 - 01:01:00


Fichiers produits par l'(les) auteur(s)


  • HAL Id : hal-01728526, version 1



Natalia Tomashenko, Yannick Estève. Evaluation of Feature-Space Speaker Adaptation for End-to-End Acoustic Models. LREC 2018, May 2018, Miyazaki, Japan. 〈hal-01728526〉



Consultations de la notice


Téléchargements de fichiers