Exploiting Comparable Corpora for Lexicon Extraction: Measuring and Improving Corpus Quality

Abstract : We study in this chapter the problem of measuring the degree of comparability of bilingual corpora, with applications to bilingual lexicon extraction. We first develop a measure which can capture different comparability levels. This measure correlates very well with gold-standard comparability levels and is relatively robust to dictionary coverage. We then propose a well-founded algorithm to improve the quality, in terms or comparability scores, of exiting comparable corpora, prior to showing that the bilingual lexicons extracted from corpora enhanced in this way are of better quality. All the experiments in this chapter are performed on French-English comparable corpora.
Type de document :
Chapitre d'ouvrage
Sharoff, Serge and Rapp, Reinhard and Zweigenbaum, Pierre and Fung, Pascale. Building and Using Comparable Corpora, Springer Berlin Heidelberg, pp.131-149, 2013, 978-3-642-20127-1. 〈10.1007/978-3-642-20128-8_7〉
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01071744
Contributeur : Maria-Irina Nicolae <>
Soumis le : lundi 6 octobre 2014 - 15:55:22
Dernière modification le : jeudi 11 octobre 2018 - 08:48:04

Identifiants

Collections

Citation

Bo Li, Eric Gaussier. Exploiting Comparable Corpora for Lexicon Extraction: Measuring and Improving Corpus Quality. Sharoff, Serge and Rapp, Reinhard and Zweigenbaum, Pierre and Fung, Pascale. Building and Using Comparable Corpora, Springer Berlin Heidelberg, pp.131-149, 2013, 978-3-642-20127-1. 〈10.1007/978-3-642-20128-8_7〉. 〈hal-01071744〉

Partager

Métriques

Consultations de la notice

295