Using a co-similarity approach on a large scale text categorization task

Clément Grimal 1 Gilles Bisson 1, *
* Auteur correspondant
Abstract : This paper presents a framework we developed for the second Large Scale Hierarchical Text Categorization challenge LSHTC2. The main idea is to propose a method allowing to deal with the terms variability among the categories in order to be able to find similarities between collections of documents belonging to the same category but having few common terms. Thus, we used a co-similarity based approach, named X-Sim, that we introduced in previous work. Nevertheless, as this co-similarity methods are not highly scalable, we need to implement a "divide and conquer'' approach to split the categories into a set of clusters containing semantically related documents. This lead to a two-stage strategy for the document categorization: first, we decide in which cluster the test document belongs, and then inside the elected cluster, we perform the final categorization that is based on our co-similarity approach.
Type de document :
Communication dans un congrès
MARAMI 2011 - Seconde conférence sur les Modèles et l′Analyse des Réseaux : Approches Mathématiques et Informatique, Oct 2011, Grenoble, France. 16p., 2011
Liste complète des métadonnées


https://hal.archives-ouvertes.fr/hal-00743577
Contributeur : Gilles Bisson <>
Soumis le : vendredi 19 octobre 2012 - 14:46:30
Dernière modification le : mardi 28 octobre 2014 - 18:35:12
Document(s) archivé(s) le : samedi 17 décembre 2016 - 02:25:17

Fichier

Bisson-Grimal-MARAMI2011.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00743577, version 1

Collections

Citation

Clément Grimal, Gilles Bisson. Using a co-similarity approach on a large scale text categorization task. MARAMI 2011 - Seconde conférence sur les Modèles et l′Analyse des Réseaux : Approches Mathématiques et Informatique, Oct 2011, Grenoble, France. 16p., 2011. <hal-00743577>

Partager

Métriques

Consultations de
la notice

140

Téléchargements du document

76