Exploiting Unbalanced Specialized Comparable Corpora for Bilingual Lexicon Extraction - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Natural Language Engineering Année : 2016

Exploiting Unbalanced Specialized Comparable Corpora for Bilingual Lexicon Extraction

Emmanuel Morin
Amir Hazem
  • Fonction : Auteur
  • PersonId : 905437

Résumé

The main work in bilingual lexicon extraction from comparable corpora is based on the implicit hypothesis that corpora are balanced in terms of size. However, the historical context-based projection method is relatively insensitive to the size of each part of the comparable corpus. Within this context, we have carried out a study on the influence of unbalanced specialized comparable corpora and on the quality of bilingual terminology extraction by doing different experiments. Moreover, we have introduced a strategy into the context-based projection method to re-estimate word co-occurrence observations. This is done by using smoothing or prediction techniques that boost the observations of word co-occurrences which are mainly useful for the smallest part of an unbalanced comparable corpus. Our results show that the use of unbalanced specialized comparable corpora results in a significant improvement in the quality of extracted lexicons.
Fichier non déposé

Dates et versions

hal-01188579 , version 1 (31-08-2015)

Identifiants

  • HAL Id : hal-01188579 , version 1

Citer

Emmanuel Morin, Amir Hazem. Exploiting Unbalanced Specialized Comparable Corpora for Bilingual Lexicon Extraction. Natural Language Engineering, 2016, Special Issue: Machine Translation Using Comparable Corpora, pp.27. ⟨hal-01188579⟩
107 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More