Skip to Main content Skip to Navigation
Conference papers

A new feature selection and feature contrasting approach based on quality metric: application to efficient classification of complex textual data

Abstract : Feature maximization is a cluster quality metric which favors clusters with maximum feature representation as regard to their associated data. This metric has already been successfully exploited, altogether, for defining unbiased clustering quality indexes, for efficient cluster labeling, as well as for substituting to distance in the clustering process, like in the IGNGF incremental clustering method. In this paper we go one step further showing that a straightforward adaptation of such metric can provide a highly efficient feature selection and feature contrasting model in the context of supervised classification. We more especially show that this technique can enhance the performance of classification methods whilst very significantly outperforming (+80%) the state-of-the art variable selection techniques in the case of the classification of unbalanced, highly multidimensional and noisy textual data, with a high degree of similarity between the classes. Our experimental dataset is a reference dataset of 7000 publications related to patents classes issued from a reference classification in the domain of pharmacology.
Complete list of metadatas

Cited literature [38 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00960127
Contributor : Patricia Gautier <>
Submitted on : Tuesday, March 18, 2014 - 9:06:13 AM
Last modification on : Wednesday, March 18, 2020 - 2:56:41 PM
Document(s) archivé(s) le : Wednesday, June 18, 2014 - 10:43:51 AM

File

qimie2013_submission_14.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00960127, version 1

Collections

Citation

Jean-Charles Lamirel, Pascal Cuxac, Kafil Hajlaoui, Aneesh Sreevallabh Chivukula. A new feature selection and feature contrasting approach based on quality metric: application to efficient classification of complex textual data. International Workshop on Quality Issues, Measures of Interestingness and Evaluation of Data Mining Models (QIMIE, Apr 2013, Australia. ⟨hal-00960127⟩

Share

Metrics

Record views

445

Files downloads

1099