HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Bilingual Lexicon Extraction from Comparable Corpora Enhanced with Parallel Corpora

Abstract : In this article, we present a simple and effective approach for extracting bilingual lexicon from comparable corpora enhanced with parallel corpora. We make use of structural characteristics of the documents comprising the comparable corpus to extract parallel sentences with a high degree of quality. We then use state-of-the-art techniques to build a specialized bilingual lexicon from these sentences and evaluate the contribution of this lexicon when added to the comparable corpus-based alignment technique. Finally, the value of this approach is demonstrated by the improvement of translation accuracy for medical words.
Complete list of metadata

Cited literature [20 references]  Display  Hide  Download

Contributor : Emmanuel Morin Connect in order to contact the contributor
Submitted on : Wednesday, July 13, 2011 - 11:15:06 AM
Last modification on : Wednesday, April 27, 2022 - 4:11:07 AM
Long-term archiving on: : Monday, November 12, 2012 - 11:00:29 AM


Files produced by the author(s)


  • HAL Id : hal-00608475, version 1



Emmanuel Morin, Emmanuel Prochasson. Bilingual Lexicon Extraction from Comparable Corpora Enhanced with Parallel Corpora. 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web, Jun 2011, Portland, United States. pp.27-34. ⟨hal-00608475⟩



Record views


Files downloads