Dynamic adjustment of language models for automatic speech recognition using word similarity - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2016

Dynamic adjustment of language models for automatic speech recognition using word similarity

Résumé

Out-of-vocabulary (OOV) words can pose a particular problem for automatic speech recognition (ASR) of broadcast news. The language models (LMs) of ASR systems are typically trained on static corpora, whereas new words (particularly new proper nouns) are continually introduced in the media. Additionally, such OOVs are often content-rich proper nouns that are vital to understanding the topic. In this work, we explore methods for dynamically adding OOVs to language models by adapting the n-gram language model used in our ASR system. We propose two strategies: the first relies on finding in-vocabulary (IV) words similar to the OOVs, where word embeddings are used to define similarity. Our second strategy leverages a small contemporary corpus to estimate OOV probabilities. The models we propose yield improvements in perplexity over the baseline; in addition, the corpus-based approach leads to a significant decrease in proper noun error rate over the baseline in recognition experiments.
Fichier principal
Vignette du fichier
slt-paper (6).pdf (175.19 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01384365 , version 1 (19-10-2016)

Identifiants

  • HAL Id : hal-01384365 , version 1

Citer

Anna Currey, Irina Illina, Dominique Fohr. Dynamic adjustment of language models for automatic speech recognition using word similarity . IEEE Workshop on Spoken Language Technology (SLT 2016), Dec 2016, San Diego, CA, United States. ⟨hal-01384365⟩
515 Consultations
877 Téléchargements

Partager

Gmail Facebook X LinkedIn More