Model-based co-clustering for mixed type data

Margot Selosse 1, 2 Julien Jacques 1, 2 Christophe Biernacki 3
3 MODAL - MOdel for Data Analysis and Learning
Inria Lille - Nord Europe, LPP - Laboratoire Paul Painlevé - UMR 8524, CERIM - Santé publique : épidémiologie et qualité des soins-EA 2694, Polytech Lille - École polytechnique universitaire de Lille, Université de Lille, Sciences et Technologies
Abstract : The importance of clustering for creating groups of observations is well known. The emergence of high-dimensional data sets with a huge number of features leads to co-clustering techniques, and several methods have been developed for simultaneously producing groups of observations and features. By grouping the data set into blocks (the crossing of a row-cluster and a column-cluster), these techniques can sometimes better summarize the data set and its inherent structure. The Latent Block Model (LBM) is a well-known method for performing co-clustering. However, recently, contexts with features of different types (here called mixed type data sets) are becoming more common. The LBM is not directly applicable to this kind of data set. Here a natural extension of the usual LBM to the ``Multiple Latent Block Model" (MLBM) is proposed in order to handle mixed type data sets. Inference is performed using a Stochastic EM-algorithm that embeds a Gibbs sampler, and allows for missing data situations. A model selection criterion is defined to choose the number of row and column clusters. The method is then applied to both simulated and real data sets.
Document type :
Journal articles
Complete list of metadatas

Cited literature [42 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01893457
Contributor : Margot Selosse <>
Submitted on : Friday, October 11, 2019 - 4:16:15 PM
Last modification on : Saturday, October 12, 2019 - 1:33:54 AM

File

manuscript.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01893457, version 2

Citation

Margot Selosse, Julien Jacques, Christophe Biernacki. Model-based co-clustering for mixed type data. Computational Statistics and Data Analysis, Elsevier, In press. ⟨hal-01893457v2⟩

Share

Metrics

Record views

14

Files downloads

11