Short Text Classification Using Semantic Random Forest

Abstract : Using traditional Random Forests in short text classification revealed a performance degradation compared to using them for standard texts. Shortness, sparseness and lack of contextual information in short texts are the reasons of this degradation. Existing solutions to overcome these issues are mainly based on data enrichment. However, data enrichment can also introduce noise. We propose a new approach that combines data enrichment with the introduction of semantics in Random Forests. Each short text is enriched with data semantically similar to its words. These data come from an external source of knowledge distributed into topics thanks to the Latent Dirichlet Allocation model. Learning process in Random Forests is adapted to consider semantic relations between words while building the trees. Tests performed on search-snippets using the new method showed significant improvements in the classification. The accuracy has increased by 34% compared to traditional Random Forests and by 20% compared to MaxEnt.
Type de document :
Communication dans un congrès
Data Warehousing and Knowledge Discovery, Sep 2014, Munich, Germany. 8646, 2014, Lecture Notes in Computer Science. <10.1007/978-3-319-10160-6_26>
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01325212
Contributeur : Frédéric Precioso <>
Soumis le : jeudi 2 juin 2016 - 00:54:53
Dernière modification le : vendredi 3 juin 2016 - 01:00:35

Identifiants

Collections

Citation

Ameni Bouaziz, Christel Dartigues-Pallez, Célia Da Costa Pereira, Frédéric Precioso, Patrick Lloret. Short Text Classification Using Semantic Random Forest. Data Warehousing and Knowledge Discovery, Sep 2014, Munich, Germany. 8646, 2014, Lecture Notes in Computer Science. <10.1007/978-3-319-10160-6_26>. <hal-01325212>

Partager

Métriques

Consultations de la notice

66