Modèles de langage probabilistes et possibilistes basés sur le Web - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2010

Modèles de langage probabilistes et possibilistes basés sur le Web

Stanislas Oger
  • Fonction : Auteur
  • PersonId : 770872
  • IdRef : 176527176
Georges Linarès

Résumé

Language models are usually built either from a closed corpus, or by using World Wide Web retrieved documents , which are considered as a closed corpus themselves. In this paper we propose several other ways of using this resource for language modeling. We first start by improving an approach consisting in estimating n-gram probabilities from Web search engine statistics. Then, we propose a new way of considering the information extracted from the Web in a probabilistic framework. Then, we also propose to rely on Possibility Theory for effectively using this kind of information. We compare these two approaches on two automatic speech recognition tasks : (i) transcribing broadcast news data, and (ii) transcribing domain-specific data, concerning surgical operation film comments. We show that the two approaches are effective in different situations.
Fichier non déposé

Dates et versions

hal-01319889 , version 1 (23-05-2016)

Identifiants

  • HAL Id : hal-01319889 , version 1

Citer

Stanislas Oger, Vladimir Popescu, Georges Linarès. Modèles de langage probabilistes et possibilistes basés sur le Web. JEP, May 2010, Mons, Belgique. ⟨hal-01319889⟩

Collections

UNIV-AVIGNON LIA
31 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More