Model-based clustering of Gaussian copulas for mixed data - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2014

Model-based clustering of Gaussian copulas for mixed data

Résumé

A mixture model of Gaussian copulas is introduced to cluster mixed-type data (data set composed by different natures of variables). Thus, the analyze can be performed on data sets composed by any kinds of variables admitting a cumulative distribution function. Copulas are used to modelize the intra-class dependencies and to preserve any distributions for the one-dimensional margins of each component. Typically in this work, each component follows a Gaussian copula which provides one correlation coefficient per couple of variables and per class. Moreover, the one-dimensional margins of each component follow classical parametric distributions in order to facilitate the model interpretation. This model generalizes many well-known models and allows meaningful data visualization as a straightforward by-product issue. A Metropolis-within-Gibbs sampler performs the Bayesian inference by avoiding the difficulties related to the parameter estimation of the copulas with discrete margins. Experiments on simulated and real data illustrate the model advantages: flexible parameters (one-dimensional margins and correlation matrices) associated to visualization aspects.
Fichier principal
Vignette du fichier
cluster_hetero_gaussian_copula.pdf (591.91 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-00987760 , version 1 (06-05-2014)
hal-00987760 , version 2 (13-08-2014)
hal-00987760 , version 3 (29-09-2015)
hal-00987760 , version 4 (20-12-2016)

Identifiants

  • HAL Id : hal-00987760 , version 2

Citer

Matthieu Marbac, Christophe Biernacki, Vincent Vandewalle. Model-based clustering of Gaussian copulas for mixed data. 2014. ⟨hal-00987760v2⟩
698 Consultations
1260 Téléchargements

Partager

Gmail Facebook X LinkedIn More