Skip to Main content Skip to Navigation
Book sections

Exploiting Comparable Corpora for Lexicon Extraction: Measuring and Improving Corpus Quality

Abstract : We study in this chapter the problem of measuring the degree of comparability of bilingual corpora, with applications to bilingual lexicon extraction. We first develop a measure which can capture different comparability levels. This measure correlates very well with gold-standard comparability levels and is relatively robust to dictionary coverage. We then propose a well-founded algorithm to improve the quality, in terms or comparability scores, of exiting comparable corpora, prior to showing that the bilingual lexicons extracted from corpora enhanced in this way are of better quality. All the experiments in this chapter are performed on French-English comparable corpora.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01071744
Contributor : Maria-Irina Nicolae <>
Submitted on : Monday, October 6, 2014 - 3:55:22 PM
Last modification on : Monday, April 20, 2020 - 11:24:01 AM

Identifiers

Collections

CNRS | UGA | LIG

Citation

Bo Li, Eric Gaussier. Exploiting Comparable Corpora for Lexicon Extraction: Measuring and Improving Corpus Quality. Sharoff, Serge and Rapp, Reinhard and Zweigenbaum, Pierre and Fung, Pascale. Building and Using Comparable Corpora, Springer Berlin Heidelberg, pp.131-149, 2013, 978-3-642-20127-1. ⟨10.1007/978-3-642-20128-8_7⟩. ⟨hal-01071744⟩

Share

Metrics

Record views

355