Variable Selection for Clustering with Gaussian Mixture Models - Archive ouverte HAL Accéder directement au contenu
Rapport (Rapport De Recherche) Année : 2007

Variable Selection for Clustering with Gaussian Mixture Models

Résumé

This article is concerned with variable selection for cluster analysis. The problem is regarded as a model selection problem in the model-based cluster analysis context. A general model generalizing the model of Raftery and Dean (2006) is proposed to specify the role of each variable. This model does not need any prior assumptions about the link between the selected and discarded variables. Models are compared with BIC. Variables role is obtained through an algorithm embedding two backward stepwise variable selection algorithms for clustering and linear regression. The consistency of the resulting criterion is proved under regularity conditions. Numerical experiments on simulated datasets and a genomics application highlight the interest of the proposed variable selection procedure.
Fichier principal
Vignette du fichier
RR-6211.pdf (338.21 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

inria-00153057 , version 1 (08-06-2007)
inria-00153057 , version 2 (11-06-2007)

Identifiants

  • HAL Id : inria-00153057 , version 2
  • PRODINRA : 251692

Citer

Cathy Maugis, Gilles Celeux, Marie-Laure Martin-Magniette. Variable Selection for Clustering with Gaussian Mixture Models. [Research Report] RR-6211, INRIA. 2007, pp.35. ⟨inria-00153057v2⟩
244 Consultations
2632 Téléchargements

Partager

Gmail Facebook X LinkedIn More