Using a co-similarity approach on a large scale text categorization task - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2011

Using a co-similarity approach on a large scale text categorization task

Résumé

This paper presents a framework we developed for the second Large Scale Hierarchical Text Categorization challenge LSHTC2. The main idea is to propose a method allowing to deal with the terms variability among the categories in order to be able to find similarities between collections of documents belonging to the same category but having few common terms. Thus, we used a co-similarity based approach, named X-Sim, that we introduced in previous work. Nevertheless, as this co-similarity methods are not highly scalable, we need to implement a "divide and conquer'' approach to split the categories into a set of clusters containing semantically related documents. This lead to a two-stage strategy for the document categorization: first, we decide in which cluster the test document belongs, and then inside the elected cluster, we perform the final categorization that is based on our co-similarity approach.
Fichier principal
Vignette du fichier
Bisson-Grimal-MARAMI2011.pdf (374.75 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00743577 , version 1 (19-10-2012)

Identifiants

  • HAL Id : hal-00743577 , version 1

Citer

Clément Grimal, Gilles Bisson. Using a co-similarity approach on a large scale text categorization task. MARAMI 2011 - Seconde conférence sur les Modèles et l′Analyse des Réseaux : Approches Mathématiques et Informatique, Oct 2011, Grenoble, France. 16p. ⟨hal-00743577⟩
106 Consultations
85 Téléchargements

Partager

Gmail Facebook X LinkedIn More