Bilingual Lexicon Extraction from Comparable Corpora Enhanced with Parallel Corpora - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2011

Bilingual Lexicon Extraction from Comparable Corpora Enhanced with Parallel Corpora

Résumé

In this article, we present a simple and effective approach for extracting bilingual lexicon from comparable corpora enhanced with parallel corpora. We make use of structural characteristics of the documents comprising the comparable corpus to extract parallel sentences with a high degree of quality. We then use state-of-the-art techniques to build a specialized bilingual lexicon from these sentences and evaluate the contribution of this lexicon when added to the comparable corpus-based alignment technique. Finally, the value of this approach is demonstrated by the improvement of translation accuracy for medical words.
Fichier principal
Vignette du fichier
W11-1205.pdf (90.71 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00608475 , version 1 (13-07-2011)

Identifiants

  • HAL Id : hal-00608475 , version 1

Citer

Emmanuel Morin, Emmanuel Ep Prochasson. Bilingual Lexicon Extraction from Comparable Corpora Enhanced with Parallel Corpora. 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web, Jun 2011, Portland, United States. pp.27-34. ⟨hal-00608475⟩
140 Consultations
239 Téléchargements

Partager

Gmail Facebook X LinkedIn More