Real-time unsupervised classification of web documents - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2009

Real-time unsupervised classification of web documents

Anthony Sigogne
  • Fonction : Auteur
  • PersonId : 928600
Mathieu Constant

Résumé

This paper adresses the problem of clustering dynamic collections of web documents. We show an iterative algorithm based on a fine-grained keyword extraction (simple, compound words and proper nouns). Each new document inserted in the collection is either assigned to an existing class containing documents of the same topic, or assigned to a new class. After each step, when necessary, classes are refined using statistical techniques. The implementation of this algorithm was successfully integrated in an application used for Information Intelligence.
Fichier principal
Vignette du fichier
123.pdf (77.19 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00722749 , version 1 (03-08-2012)

Identifiants

Citer

Anthony Sigogne, Mathieu Constant. Real-time unsupervised classification of web documents. 4th International Multiconference on Computer Science and Information Technology (IMCSIT'09), Oct 2009, Mragowo, Poland. pp.281-286, ⟨10.1109/IMCSIT.2009.5352714⟩. ⟨hal-00722749⟩
124 Consultations
209 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More