A new feature selection and feature contrasting approach based on quality metric: application to efficient classification of complex textual data

Jean-Charles Lamirel; Pascal Cuxac; Kafil Hajlaoui; Aneesh Sreevallabh Chivukula

Communication Dans Un Congrès Année : 2013

A new feature selection and feature contrasting approach based on quality metric: application to efficient classification of complex textual data

(1) , (2) , (2) , (3)

1
2
3

Jean-Charles Lamirel

Fonction : Auteur
PersonId : 8202
IdHAL : jean-charles-lamirel

Natural Language Processing : representations, inference and semantics

Pascal Cuxac

Fonction : Auteur
PersonId : 179348
IdHAL : pascal-cuxac
ORCID : 0000-0002-6809-5654
IdRef : 165835257

Institut de l'information scientifique et technique

Kafil Hajlaoui

Fonction : Auteur

Institut de l'information scientifique et technique

Aneesh Sreevallabh Chivukula

Fonction : Auteur

International Institute of Information Technology [Hyperabad]

Résumé

Feature maximization is a cluster quality metric which favors clusters with maximum feature representation as regard to their associated data. This metric has already been successfully exploited, altogether, for defining unbiased clustering quality indexes, for efficient cluster labeling, as well as for substituting to distance in the clustering process, like in the IGNGF incremental clustering method. In this paper we go one step further showing that a straightforward adaptation of such metric can provide a highly efficient feature selection and feature contrasting model in the context of supervised classification. We more especially show that this technique can enhance the performance of classification methods whilst very significantly outperforming (+80%) the state-of-the art variable selection techniques in the case of the classification of unbalanced, highly multidimensional and noisy textual data, with a high degree of similarity between the classes. Our experimental dataset is a reference dataset of 7000 publications related to patents classes issued from a reference classification in the domain of pharmacology.

Mots clés

Classification supervisée Classification automatique Sélection variables Texte Infométrie

Domaines

Recherche d'information [cs.IR] Applications [stat.AP]

Fichier principal

qimie2013_submission_14.pdf (150.21 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Patricia Gautier : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00960127

Soumis le : mardi 18 mars 2014-09:06:13

Dernière modification le : dimanche 8 octobre 2023-04:10:29

Archivage à long terme le : mercredi 18 juin 2014-10:43:51

Dates et versions

hal-00960127 , version 1 (18-03-2014)

Licence

Paternité

Identifiants

HAL Id : hal-00960127 , version 1

Citer

Jean-Charles Lamirel, Pascal Cuxac, Kafil Hajlaoui, Aneesh Sreevallabh Chivukula. A new feature selection and feature contrasting approach based on quality metric: application to efficient classification of complex textual data. International Workshop on Quality Issues, Measures of Interestingness and Evaluation of Data Mining Models (QIMIE 2013), Apr 2013, Gold Coast, Australia. ⟨hal-00960127⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE LORIA LORIA-NLPKD INIST

228 Consultations

837 Téléchargements

A new feature selection and feature contrasting approach based on quality metric: application to efficient classification of complex textual data

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Partager