Clustering Comparable Corpora For Bilingual Lexicon Extraction

Abstract : We study in this paper the problem of enhancing the comparability of bilingual corpora in order to improve the quality of bilingual lexicons extracted from comparable corpora. We introduce a clustering-based approach for enhancing corpus comparability which exploits the homogeneity feature of the corpus, and finally preserves most of the vocabulary of the original corpus. Our experiments illustrate the well-foundedness of this method and show that the bilingual lexicons obtained from the homogeneous corpus are of better quality than the lexicons obtained with previous approaches.
Type de document :
Communication dans un congrès
ACL-HLT 2011, Jun 2011, Portland, Oregon, United States. Association for Computational Linguistics, pp.473-478, 2011
Liste complète des métadonnées


https://hal.archives-ouvertes.fr/hal-00742264
Contributeur : Eric Gaussier <>
Soumis le : mercredi 17 octobre 2012 - 16:12:22
Dernière modification le : mardi 28 octobre 2014 - 18:35:11
Document(s) archivé(s) le : vendredi 18 janvier 2013 - 03:44:37

Fichier

Li-Gaussier-Azawa-ACL_11_web.p...
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

  • HAL Id : hal-00742264, version 1

Citation

Li Bo, Éric Gaussier, Akiko Aizawa. Clustering Comparable Corpora For Bilingual Lexicon Extraction. ACL-HLT 2011, Jun 2011, Portland, Oregon, United States. Association for Computational Linguistics, pp.473-478, 2011. <hal-00742264>

Partager

Métriques

Consultations de
la notice

267

Téléchargements du document

204