Short Text Classification Using Semantic Random Forest - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2014

Short Text Classification Using Semantic Random Forest

Résumé

Using traditional Random Forests in short text classification revealed a performance degradation compared to using them for standard texts. Shortness, sparseness and lack of contextual information in short texts are the reasons of this degradation. Existing solutions to overcome these issues are mainly based on data enrichment. However, data enrichment can also introduce noise. We propose a new approach that combines data enrichment with the introduction of semantics in Random Forests. Each short text is enriched with data semantically similar to its words. These data come from an external source of knowledge distributed into topics thanks to the Latent Dirichlet Allocation model. Learning process in Random Forests is adapted to consider semantic relations between words while building the trees. Tests performed on search-snippets using the new method showed significant improvements in the classification. The accuracy has increased by 34% compared to traditional Random Forests and by 20% compared to MaxEnt.
Fichier non déposé

Dates et versions

hal-01325212 , version 1 (02-06-2016)

Identifiants

Citer

Ameni Bouaziz, Christel Dartigues-Pallez, Célia da Costa Pereira, Frédéric Precioso, Patrick Lloret. Short Text Classification Using Semantic Random Forest. Data Warehousing and Knowledge Discovery, Sep 2014, Munich, Germany. ⟨10.1007/978-3-319-10160-6_26⟩. ⟨hal-01325212⟩
191 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More