Automatically Finding Semantically Consistent N-grams to Add New Words in LVCSR Systems - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2011

Automatically Finding Semantically Consistent N-grams to Add New Words in LVCSR Systems

Résumé

This paper presents a new method to automatically add n-grams containing out-of-vocabulary (OOV) words to a baseline language model (LM), where these n-grams are sought to be grammatically correct and to make sense according to the meaning of OOV words. First, this method consists in determining the word sequences, i.e., n-grams, in which the usage of a given OOV word is the most semantically consistent. Then, conditional probabilities of these n-grams have to be computed. To do this, semantic relations between words are used to assimilate each OOV word to several equivalent in-vocabulary words. Based on these last words, n-grams from the baseline LM are re-used to find the word sequences to be added and to compute their probabilities. After augmenting the vocabulary and launching a recognition process, experiments show that our method results in WER improvements which are comparable to those obtained using a state-of-the-art open vocabulary LM.
Fichier principal
Vignette du fichier
lecorve_icassp2011.pdf (81.39 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00645223 , version 1 (27-11-2011)

Identifiants

  • HAL Id : hal-00645223 , version 1

Citer

Gwénolé Lecorvé, Guillaume Gravier, Pascale Sébillot. Automatically Finding Semantically Consistent N-grams to Add New Words in LVCSR Systems. IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, May 2011, Prague, Czech Republic. 4 p., 2 columns. ⟨hal-00645223⟩
394 Consultations
167 Téléchargements

Partager

Gmail Facebook X LinkedIn More