Evaluation of Feature-Space Speaker Adaptation for End-to-End Acoustic Models

Abstract : This paper investigates speaker adaptation techniques for bidirectional long short term memory (BLSTM) recurrent neural network based acoustic models (AMs) trained with the connectionist temporal classification (CTC) objective function. BLSTM-CTC AMs play an important role in end-to-end automatic speech recognition systems. However, there is a lack of research in speaker adaptation algorithms for these models. We explore three different feature-space adaptation approaches for CTC AMs: feature-space maximum linear regression, i-vector based adaptation, and maximum a posteriori adaptation using GMM-derived features. Experimental results on the TED-LIUM corpus demonstrate that speaker adaptation, applied in combination with data augmentation techniques, provides, in an unsupervised adaptation mode, for different test sets, up to 11--20% of relative word error rate reduction over the baseline model built on the raw filter-bank features. In addition, the adaptation behavior is compared for BLSTM-CTC AMs and time-delay neural network AMs trained with the cross-entropy criterion.
Type de document :
Communication dans un congrès
LREC 2018, May 2018, Miyazaki, Japan
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01728526
Contributeur : Yannick Estève <>
Soumis le : lundi 26 mars 2018 - 10:37:12
Dernière modification le : jeudi 29 mars 2018 - 01:01:00

Fichiers

main.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01728526, version 1

Collections

Citation

Natalia Tomashenko, Yannick Estève. Evaluation of Feature-Space Speaker Adaptation for End-to-End Acoustic Models. LREC 2018, May 2018, Miyazaki, Japan. 〈hal-01728526〉

Partager

Métriques

Consultations de la notice

314

Téléchargements de fichiers

352