Using Kullback-Leibler Distance for Text Categorization - Archive ouverte HAL Accéder directement au contenu
Chapitre D'ouvrage Année : 2003

Using Kullback-Leibler Distance for Text Categorization

Résumé

A system that performs text categorization aims to assign appropriate categories from a predefined classification scheme to incoming documents. These assignments might be used for varied purposes such as filtering, or retrieval. This paper introduces a new effective model for text categorization with great corpus (more or less 1 million documents). Text categorization is performed using the Kullback-Leibler distance between the probability distribution of the document to classify and the probability distribution of each category. Using the same representation of categories, experiments show a significant improvement when the above mentioned method is used. KLD method achieve substantial improvements over the tfidf performing method.
Fichier principal
Vignette du fichier
bigi2003ecir.pdf (279.64 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-01392500 , version 1 (12-11-2021)

Identifiants

Citer

Brigitte Bigi. Using Kullback-Leibler Distance for Text Categorization. Advances in Information Retrieval, 2633, Springer Berlin Heidelberg, pp.305-319, 2003, ⟨10.1007/3-540-36618-0_22⟩. ⟨hal-01392500⟩
206 Consultations
418 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More