A new feature selection and feature contrasting approach based on quality metric: application to efficient classification of complex textual data

Abstract : Feature maximization is a cluster quality metric which favors clusters with maximum feature representation as regard to their associated data. This metric has already been successfully exploited, altogether, for defining unbiased clustering quality indexes, for efficient cluster labeling, as well as for substituting to distance in the clustering process, like in the IGNGF incremental clustering method. In this paper we go one step further showing that a straightforward adaptation of such metric can provide a highly efficient feature selection and feature contrasting model in the context of supervised classification. We more especially show that this technique can enhance the performance of classification methods whilst very significantly outperforming (+80%) the state-of-the art variable selection techniques in the case of the classification of unbalanced, highly multidimensional and noisy textual data, with a high degree of similarity between the classes. Our experimental dataset is a reference dataset of 7000 publications related to patents classes issued from a reference classification in the domain of pharmacology.
Type de document :
Communication dans un congrès
International Workshop on Quality Issues, Measures of Interestingness and Evaluation of Data Mining Models (QIMIE, Apr 2013, Australia. 2013
Liste complète des métadonnées

Littérature citée [38 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-00960127
Contributeur : Patricia Gautier <>
Soumis le : mardi 18 mars 2014 - 09:06:13
Dernière modification le : mardi 24 avril 2018 - 13:36:30
Document(s) archivé(s) le : mercredi 18 juin 2014 - 10:43:51

Fichier

qimie2013_submission_14.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00960127, version 1

Collections

Citation

Jean-Charles Lamirel, Pascal Cuxac, Kafil Hajlaoui, Aneesh Sreevallabh Chivukula. A new feature selection and feature contrasting approach based on quality metric: application to efficient classification of complex textual data. International Workshop on Quality Issues, Measures of Interestingness and Evaluation of Data Mining Models (QIMIE, Apr 2013, Australia. 2013. 〈hal-00960127〉

Partager

Métriques

Consultations de la notice

383

Téléchargements de fichiers

624