A methodology for topographic clustering of structured text documents - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2004

A methodology for topographic clustering of structured text documents

Delphine Dard
  • Fonction : Auteur
Florence d'Alché-Buc

Résumé

Sets of texts are structured through a more or less refined hierarchy of sections, subsections and paragraphs; this structure contains information that should be exploited to handle these data and in particular, to enrich the comparison of texts, as a complement to the vector description of their contents. We propose a kernel-based methodology that follows this principle for a topographic clustering task and define a hierarchical kernel which compares paragraphs using the available hierarchical decomposition and in particular the provided titles.
Fichier non déposé

Dates et versions

hal-01520690 , version 1 (10-05-2017)

Identifiants

  • HAL Id : hal-01520690 , version 1

Citer

Marie-Jeanne Lesot, Delphine Dard, Florence d'Alché-Buc. A methodology for topographic clustering of structured text documents. PASCAL Workshop on Learning Methods for Text Understanding and Mining, 2004, Grenoble, France. ⟨hal-01520690⟩
50 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More