An algorithm for cross-lingual sense-clustering tested in a MT evaluation setting

Abstract : Unsupervised sense induction methods offer a solution to the problem of scarcity of semantic resources. These methods automatically extract semantic information from textual data and create resources adapted to specific applications and domains of interest. In this paper, we present a clustering algorithm for cross-lingual sense induction which generates bilingual semantic inventories from parallel corpora. We describe the clustering procedure and the obtained resources. We then proceed to a large-scale evaluation by integrating the resources into a Machine Translation (MT) metric (METEOR). We show that the use of the data-driven sense-cluster inventories leads to better correlation with human judgments of translation quality, compared to precision-based metrics, and to improvements similar to those obtained when a hand-crafted semantic resource is used.
Type de document :
Communication dans un congrès
International Workshop on Spoken Language Translation (IWSLT-2010), Dec 2010, Paris, France. pp.219--226, 2010
Liste complète des métadonnées

Littérature citée [28 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-00544745
Contributeur : Marianna Apidianaki <>
Soumis le : mercredi 8 décembre 2010 - 19:17:38
Dernière modification le : vendredi 4 janvier 2019 - 17:33:24
Document(s) archivé(s) le : jeudi 10 mars 2011 - 12:56:41

Fichier

Apidianaki_and_He10.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00544745, version 1

Collections

Citation

Marianna Apidianaki, Yifan He. An algorithm for cross-lingual sense-clustering tested in a MT evaluation setting. International Workshop on Spoken Language Translation (IWSLT-2010), Dec 2010, Paris, France. pp.219--226, 2010. 〈hal-00544745〉

Partager

Métriques

Consultations de la notice

300

Téléchargements de fichiers

151