Evaluation of Feature-Space Speaker Adaptation for End-to-End Acoustic Models - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

Evaluation of Feature-Space Speaker Adaptation for End-to-End Acoustic Models

Résumé

This paper investigates speaker adaptation techniques for bidirectional long short term memory (BLSTM) recurrent neural network based acoustic models (AMs) trained with the connectionist temporal classification (CTC) objective function. BLSTM-CTC AMs play an important role in end-to-end automatic speech recognition systems. However, there is a lack of research in speaker adaptation algorithms for these models. We explore three different feature-space adaptation approaches for CTC AMs: feature-space maximum linear regression, i-vector based adaptation, and maximum a posteriori adaptation using GMM-derived features. Experimental results on the TED-LIUM corpus demonstrate that speaker adaptation, applied in combination with data augmentation techniques, provides, in an unsupervised adaptation mode, for different test sets, up to 11--20% of relative word error rate reduction over the baseline model built on the raw filter-bank features. In addition, the adaptation behavior is compared for BLSTM-CTC AMs and time-delay neural network AMs trained with the cross-entropy criterion.
Fichier principal
Vignette du fichier
main.pdf (526.75 Ko) Télécharger le fichier
main.synctex.gz (133.83 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-01728526 , version 1 (26-03-2018)

Identifiants

  • HAL Id : hal-01728526 , version 1

Citer

Natalia Tomashenko, Yannick Estève. Evaluation of Feature-Space Speaker Adaptation for End-to-End Acoustic Models. LREC 2018, May 2018, Miyazaki, Japan. ⟨hal-01728526⟩
330 Consultations
499 Téléchargements

Partager

Gmail Facebook X LinkedIn More