Using a co-similarity approach on a large scale text categorization task

Clément Grimal; Gilles Bisson

Communication Dans Un Congrès Année : 2011

Using a co-similarity approach on a large scale text categorization task

(1) , (1)

Clément Grimal

Fonction : Auteur
PersonId : 931225

Analyse de données, Modélisation et Apprentissage automatique [Grenoble]

Gilles Bisson

Fonction : Auteur correspondant
PersonId : 931222

Connectez-vous pour contacter l'auteur

Analyse de données, Modélisation et Apprentissage automatique [Grenoble]

Résumé

This paper presents a framework we developed for the second Large Scale Hierarchical Text Categorization challenge LSHTC2. The main idea is to propose a method allowing to deal with the terms variability among the categories in order to be able to find similarities between collections of documents belonging to the same category but having few common terms. Thus, we used a co-similarity based approach, named X-Sim, that we introduced in previous work. Nevertheless, as this co-similarity methods are not highly scalable, we need to implement a "divide and conquer'' approach to split the categories into a set of clusters containing semantically related documents. This lead to a two-stage strategy for the document categorization: first, we decide in which cluster the test document belongs, and then inside the elected cluster, we perform the final categorization that is based on our co-similarity approach.

Mots clés

Categorization text mining large-scale database

Domaines

Apprentissage [cs.LG] Intelligence artificielle [cs.AI]

Fichier principal

Bisson-Grimal-MARAMI2011.pdf (374.75 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Gilles Bisson : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00743577

Soumis le : vendredi 19 octobre 2012-14:46:30

Dernière modification le : jeudi 4 avril 2024-21:05:29

Archivage à long terme le : samedi 17 décembre 2016-02:25:17

Dates et versions

hal-00743577 , version 1 (19-10-2012)

Identifiants

HAL Id : hal-00743577 , version 1

Citer

Clément Grimal, Gilles Bisson. Using a co-similarity approach on a large scale text categorization task. MARAMI 2011 - Seconde conférence sur les Modèles et l′Analyse des Réseaux : Approches Mathématiques et Informatique, Oct 2011, Grenoble, France. 16p. ⟨hal-00743577⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS LIG LIG_TDCGE LIG_TDCGE_AMA LIG_SIDCH LIG_SIDCH_APTIKAL

106 Consultations

85 Téléchargements

Using a co-similarity approach on a large scale text categorization task

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager