Using the Web for fast language model construction in minority languages - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2003

Using the Web for fast language model construction in minority languages

Viet-Bac Le
  • Fonction : Auteur
Brigitte Bigi
  • Fonction : Auteur
Eric Castelli

Résumé

The design and construction of a language model for minority languages is a hard task. By minority language, we mean a language with small available resources, especially for the statistical learning problem. In this paper, a new methodology for fast language model construction in minority languages is proposed. It is based on the use of Web resources to collect and make efficient textual corpora. By using efficient filtering techniques, this methodology allows a quick and efficient construction of a language model with a small cost in term of computational and human resources. Our primary experiments have shown excellent performance of the Web language models vs newspaper language models using the proposed filtering methods on a majority language (French). Following the same way for a minority language (Vietnamese), a valuable language model was constructed in 3 month with only 15% new development to convert some filtering tools.
Fichier principal
Vignette du fichier
7a1a3724fcfd19af467a1608cfc392b64fab.pdf (38.42 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01392377 , version 1 (04-11-2016)

Identifiants

  • HAL Id : hal-01392377 , version 1

Citer

Viet-Bac Le, Brigitte Bigi, Laurent Besacier, Eric Castelli. Using the Web for fast language model construction in minority languages. Eurospeech, 2003, Geneva, Switzerland. pp.3117--3120. ⟨hal-01392377⟩
179 Consultations
166 Téléchargements

Partager

Gmail Facebook X LinkedIn More