Skip to Main content Skip to Navigation
Conference papers

Clustering Comparable Corpora For Bilingual Lexicon Extraction

Abstract : We study in this paper the problem of enhancing the comparability of bilingual corpora in order to improve the quality of bilingual lexicons extracted from comparable corpora. We introduce a clustering-based approach for enhancing corpus comparability which exploits the homogeneity feature of the corpus, and finally preserves most of the vocabulary of the original corpus. Our experiments illustrate the well-foundedness of this method and show that the bilingual lexicons obtained from the homogeneous corpus are of better quality than the lexicons obtained with previous approaches.
Complete list of metadatas

Cited literature [16 references]  Display  Hide  Download
Contributor : Eric Gaussier <>
Submitted on : Wednesday, October 17, 2012 - 4:12:22 PM
Last modification on : Monday, April 20, 2020 - 11:24:01 AM
Document(s) archivé(s) le : Friday, January 18, 2013 - 3:44:37 AM


Publisher files allowed on an open archive


  • HAL Id : hal-00742264, version 1


Li Bo, Éric Gaussier, Akiko Aizawa. Clustering Comparable Corpora For Bilingual Lexicon Extraction. ACL-HLT 2011, Jun 2011, Portland, Oregon, United States. pp.473-478. ⟨hal-00742264⟩



Record views


Files downloads