Combination of Random Indexing based Language Model and N-gram Language Model for Speech Recognition - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2013

Combination of Random Indexing based Language Model and N-gram Language Model for Speech Recognition

Dominique Fohr
Odile Mella

Résumé

This paper presents the results and conclusion of a study on the introduction of semantic information through the Random Indexing paradigm in statistical language models used in speech recognition. Random Indexing is an alternative to Latent Semantic Analysis (LSA) that addresses the scalability problem of LSA. After a brief presentation of Random Indexing (RI), this paper describes, different methods to estimate the RI matrix, then how to derive probabilities from the RI matrix and finally how to combine them with n-gram language model probabilities. Then, it analyzes the performance of these different RI methods and their combinations with a 4-gram language model by computing the perplexity of a test corpus of 290,000 words from the French evaluation campaign ETAPE. Among our results, the main conclusions are (1) regardless of the method, function words should not be taken into account in the estimation of RI matrix; (2) The two methods RI_basic and TTRI_w achieved the best perplexity, i.e. a relative gain of 3% compared to the perplexity of the 4-gram language model alone (136.2 vs. 140.4).
Fichier non déposé

Dates et versions

hal-00833898 , version 1 (13-06-2013)

Identifiants

  • HAL Id : hal-00833898 , version 1

Citer

Dominique Fohr, Odile Mella. Combination of Random Indexing based Language Model and N-gram Language Model for Speech Recognition. INTERSPEECH - 14th Annual Conference of the International Speech Communication Association - 2013, Aug 2013, Lyon, France. ⟨hal-00833898⟩
201 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More