Skip to Main content Skip to Navigation
Book sections

Using Kullback-Leibler Distance for Text Categorization

Brigitte Bigi 1
1 ADELE - Environnements et outils pour le Génie Logiciel Industriel
LIG - Laboratoire d'Informatique de Grenoble
Abstract : A system that performs text categorization aims to assign appropriate categories from a predefined classification scheme to incoming documents. These assignments might be used for varied purposes such as filtering, or retrieval. This paper introduces a new effective model for text categorization with great corpus (more or less 1 million documents). Text categorization is performed using the Kullback-Leibler distance between the probability distribution of the document to classify and the probability distribution of each category. Using the same representation of categories, experiments show a significant improvement when the above mentioned method is used. KLD method achieve substantial improvements over the tfidf performing method.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-01392500
Contributor : Brigitte Bigi Connect in order to contact the contributor
Submitted on : Friday, November 12, 2021 - 7:45:57 AM
Last modification on : Monday, November 29, 2021 - 2:39:41 PM

File

bigi2003ecir.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Brigitte Bigi. Using Kullback-Leibler Distance for Text Categorization. Advances in Information Retrieval, 2633, Springer Berlin Heidelberg, pp.305-319, 2003, ⟨10.1007/3-540-36618-0_22⟩. ⟨hal-01392500⟩

Share

Metrics

Les métriques sont temporairement indisponibles