Clustering of Variables for Mixed Data - Archive ouverte HAL Accéder directement au contenu
Chapitre D'ouvrage Année : 2016

Clustering of Variables for Mixed Data

Résumé

This chapter presents clustering of variables which aim is to lump together strongly related variables. The proposed approach works on a mixed data set, i.e. on a data set which contains numerical variables and categorical variables. Two algorithms of clustering of variables are described: a hierarchical clustering and a k-means type clustering. A brief description of PCAmix method (that is a principal component analysis for mixed data) is provided, since the calculus of the synthetic variables summarizing the obtained clusters of variables is based on this multivariate method. Finally, the R packages {\bf ClustOfVar} and {\bf PCAmixdata} are illustrated on real mixed data. The PCAmix (resp. ClustOfVar) approach is first used for dimension reduction (step1) before standard clustering of the individuals (step 2).
Fichier non déposé

Dates et versions

hal-01417442 , version 1 (15-12-2016)

Identifiants

  • HAL Id : hal-01417442 , version 1

Citer

Jerome Saracco, Marie Chavent. Clustering of Variables for Mixed Data. Statistics for Astrophysics: Clustering and Classification, 77, EDP Sciences, pp.91-119, 2016, EAS Publications Series, 978-2-7598-9001-9. ⟨hal-01417442⟩
104 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More