A new feature selection and feature contrasting approach based on quality metric: application to efficient classification of complex textual data - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2013

A new feature selection and feature contrasting approach based on quality metric: application to efficient classification of complex textual data

Résumé

Feature maximization is a cluster quality metric which favors clusters with maximum feature representation as regard to their associated data. This metric has already been successfully exploited, altogether, for defining unbiased clustering quality indexes, for efficient cluster labeling, as well as for substituting to distance in the clustering process, like in the IGNGF incremental clustering method. In this paper we go one step further showing that a straightforward adaptation of such metric can provide a highly efficient feature selection and feature contrasting model in the context of supervised classification. We more especially show that this technique can enhance the performance of classification methods whilst very significantly outperforming (+80%) the state-of-the art variable selection techniques in the case of the classification of unbalanced, highly multidimensional and noisy textual data, with a high degree of similarity between the classes. Our experimental dataset is a reference dataset of 7000 publications related to patents classes issued from a reference classification in the domain of pharmacology.
Fichier principal
Vignette du fichier
qimie2013_submission_14.pdf (150.21 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00960127 , version 1 (18-03-2014)

Licence

Paternité

Identifiants

  • HAL Id : hal-00960127 , version 1

Citer

Jean-Charles Lamirel, Pascal Cuxac, Kafil Hajlaoui, Aneesh Sreevallabh Chivukula. A new feature selection and feature contrasting approach based on quality metric: application to efficient classification of complex textual data. International Workshop on Quality Issues, Measures of Interestingness and Evaluation of Data Mining Models (QIMIE 2013), Apr 2013, Gold Coast, Australia. ⟨hal-00960127⟩
228 Consultations
837 Téléchargements

Partager

Gmail Facebook X LinkedIn More