QAlign: A New Method for Bilingual Lexicon Extraction from Comparable Corpora. - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2012

QAlign: A New Method for Bilingual Lexicon Extraction from Comparable Corpora.

Résumé

In this paper, we present a new way of looking at the problem of bilingual lexicon extraction from comparable corpora, mainly inspired from information retrieval (IR) domain and more specifically, from question-answering systems (QAS). By analogy to QAS, we consider a word to be translated as a part of a question extracted from a source language, and we try to find out the correct translation assuming that it is contained in the correct answer of that question extracted from the target language. The methods traditionally dedicated to the task of bilingual lexicon extraction from comparable corpora tend to represent the whole contexts of a word in a single vector and thus, give a general representation of all its contexts. We believe that a local representation of the contexts of a word, given by a window that corresponds to the query, is more appropriate as we give more importance to local information that could be swallowed up in the volume if represented and treated in a single whole context vector. We show that the empirical results obtained are competitive with the standard approach traditionally dedicated to this task.
Fichier non déposé

Dates et versions

hal-00949335 , version 1 (19-02-2014)

Identifiants

  • HAL Id : hal-00949335 , version 1

Citer

Amir Hazem, Emmanuel Morin. QAlign: A New Method for Bilingual Lexicon Extraction from Comparable Corpora.. the 13th Conference on Intelligent Text Processing and Computational Linguistics. CICLing 2012., Mar 2012, New Delhi, India. pp.12. ⟨hal-00949335⟩
59 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More