Vector disambiguation for translation extraction from comparable corpora. - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Informatica Année : 2013

Vector disambiguation for translation extraction from comparable corpora.

Nikola Ljubesic
  • Fonction : Auteur
  • PersonId : 990315
Darja Fiser
  • Fonction : Auteur
  • PersonId : 907730

Résumé

We present a new data-driven approach for enhancing the extraction of translation equivalents from comparable corpora which exploits bilingual lexico-semantic knowledge harvested from a parallel corpus. First, the bilingual lexicon obtained from word-aligning the parallel corpus replaces an external seed dictionary, making the approach knowledge-light and portable. Next, instead of using simple one-to-one mappings between the source and the target language, translation equivalents are clustered into sets of synonyms by a cross-lingual Word Sense Induction method. The obtained sense clusters enable us to expand the translation of vector features with several translation variants, using a cross-lingual Word Sense Disambiguation method. Consequently, the vector features are disambiguated and translated with the translation variants included in the semantically most appropriate cluster, thus producing less noisy and richer vectors that allow for a more successful cross-lingual comparison of the vectors compared to previous methods.
Fichier non déposé

Dates et versions

hal-01620307 , version 1 (20-10-2017)

Identifiants

  • HAL Id : hal-01620307 , version 1

Citer

Marianna Apidianaki, Nikola Ljubesic, Darja Fiser. Vector disambiguation for translation extraction from comparable corpora.. Informatica, 2013, 37, pp.193-201. ⟨hal-01620307⟩
30 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More