Improving the clustering or categorization of bi-lingual data by means of comparability mapping - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2013

Improving the clustering or categorization of bi-lingual data by means of comparability mapping

Résumé

We address in this paper the co-clustering and co-classification of bilingual data by mixing similarity measures existing in each of the two linguistic spaces with a comparability measure that defines a mapping between these two spaces. A new approach is proposed to combine comparability and similarities measures with the aim to improve jointly the accuracy of classification and clustering algorithms performed in each of the two linguistic spaces, as well as the mapping of comparable clusters that are obtained. In this paper, we propose two variants of the comparability measure defined by [1] and evaluate our co-classification and co-clustering strategy on a data set collected from Wikipedia categories. Our experiments show clear improvements in clustering and classification accuracy when mixing comparability with similarities, with a higher robustness obtained when using the two comparability variants we propose. We believe that this approach is well suited for the construction of thematic comparable corpora of good quality.
Fichier principal
Vignette du fichier
Co-ClassifOfBilingualData-v2.pdf (150.52 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-00958730 , version 1 (13-03-2014)
hal-00958730 , version 2 (25-02-2015)
hal-00958730 , version 3 (25-02-2015)

Identifiants

  • HAL Id : hal-00958730 , version 1

Citer

Guiyao Ke, Pierre-François Marteau, Gilbas Ménier. Improving the clustering or categorization of bi-lingual data by means of comparability mapping. 2013. ⟨hal-00958730v1⟩
296 Consultations
119 Téléchargements

Partager

Gmail Facebook X LinkedIn More