Analyzing Learned Representations of a Deep ASR Performance Prediction Model

This paper addresses a relatively new task: prediction of ASR performance on unseen broadcast programs. In a previous paper, we presented an ASR performance prediction system using CNNs that encode both text (ASR transcript) and speech, in order to predict word error rate. This work is dedicated to the analysis of speech signal embeddings and text em-beddings learnt by the CNN while training our prediction model. We try to better understand which information is captured by the deep model and its relation with different conditioning factors. It is shown that hidden layers convey a clear signal about speech style, accent and broadcast type. We then try to leverage these 3 types of information at training time through multi-task learning. Our experiments show that this allows to train slightly more efficient ASR performance prediction systems that-in addition-simultaneously tag the analyzed utterances according to their speech style, accent and broadcast program origin.

Domaines

Informatique et langage [cs.CL]

Fichier principal

emnlp2018.pdf (626.71 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Benjamin Lecouteux : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01863293

Soumis le : mardi 28 août 2018-11:58:58

Dernière modification le : lundi 15 avril 2024-11:25:23

Archivage à long terme le : jeudi 29 novembre 2018-15:22:49

Dates et versions

hal-01863293 , version 1 (28-08-2018)

Identifiants

HAL Id : hal-01863293 , version 1

Citer

Zied Elloumi, Laurent Besacier, Olivier Galibert, Benjamin Lecouteux. Analyzing Learned Representations of a Deep ASR Performance Prediction Model. Blackbox NLP Workshop and EMLP 2018, Nov 2018, Bruxelles, Belgium. ⟨hal-01863293⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-BOURGOGNE UGA CNRS LIG CIMEOS LIG_TDCGE_GETALP LNE POLYTECH-GRENOBLE LIG_SIDCH

68 Consultations

85 Téléchargements