Qualitative Evaluation of Language Model Rescoring in Automatic Speech Recognition

Thibault Bañeras Roux; Mickael Rouvier; Jane Wottawa; Richard Dufour

Communication Dans Un Congrès Année : 2022

Qualitative Evaluation of Language Model Rescoring in Automatic Speech Recognition

(1) , (2) , (3) , (1)

1
2
3

Thibault Bañeras Roux

Fonction : Auteur
PersonId : 755046
IdHAL : thibault-baneras-roux

Traitement Automatique du Langage Naturel

Mickael Rouvier

Fonction : Auteur
PersonId : 982551
IdHAL : mickael-rouvier
ORCID : 0000-0003-3541-3385

Laboratoire Informatique d'Avignon

Jane Wottawa

Fonction : Auteur
PersonId : 174626
IdHAL : jwottawa
ORCID : 0000-0002-3855-0695
IdRef : 226300307

Laboratoire d'Informatique de l'Université du Mans

Richard Dufour

Fonction : Auteur
PersonId : 178348
IdHAL : richard-dufour
ORCID : 0000-0003-1203-9108

Traitement Automatique du Langage Naturel

Résumé

Evaluating automatic speech recognition (ASR) systems is a classical but difficult and still open problem, which often boils down to focusing only on the word error rate (WER). However, this metric suffers from many limitations and does not allow an in-depth analysis of automatic transcription errors. In this paper, we propose to study and understand the impact of rescoring using language models in ASR systems by means of several metrics often used in other natural language processing (NLP) tasks in addition to the WER. In particular, we introduce two measures related to morpho-syntactic and semantic aspects of transcribed words: 1) the POSER (Part-of-speech Error Rate), which should highlight the grammatical aspects, and 2) the Em-bER (Embedding Error Rate), a measurement that modifies the WER by providing a weighting according to the semantic distance of the wrongly transcribed words. These metrics illustrate the linguistic contributions of the language models that are applied during a posterior rescoring step on transcription hypotheses.

Mots clés

Automatic speech recognition Semantic analysis Language modeling Evaluation metrics

Domaines

Informatique et langage [cs.CL]

Fichier principal

Thibault_Roux___InterSpeech_2022_v4.pdf (130.65 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Thibault Bañeras-Roux : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03712735

Soumis le : jeudi 17 novembre 2022-17:18:30

Dernière modification le : vendredi 2 février 2024-10:02:37

Dates et versions

hal-03712735 , version 1 (04-07-2022)

hal-03712735 , version 2 (17-11-2022)

Identifiants

HAL Id : hal-03712735 , version 2

Citer

Thibault Bañeras Roux, Mickael Rouvier, Jane Wottawa, Richard Dufour. Qualitative Evaluation of Language Model Rescoring in Automatic Speech Recognition. Interspeech, Sep 2022, Incheon, South Korea. ⟨hal-03712735v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM UNIV-AVIGNON CNRS INRIA UNIV-LEMANS EC-NANTES UNAM GENCI LIUM LIUM-LST LS2N LS2N-TALN LIA ANR NANTES-UNIVERSITE

375 Consultations

445 Téléchargements

Qualitative Evaluation of Language Model Rescoring in Automatic Speech Recognition

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager