Model-based co-clustering for mixed type data

Margot Selosse 1, 2 Julien Jacques 1, 2 Christophe Biernacki 3
3 MODAL - MOdel for Data Analysis and Learning
Inria Lille - Nord Europe, LPP - Laboratoire Paul Painlevé - UMR 8524, CERIM - Santé publique : épidémiologie et qualité des soins-EA 2694, Polytech Lille - École polytechnique universitaire de Lille, Université de Lille, Sciences et Technologies
Abstract : Over decades, a lot of studies have shown the importance of clustering to emphasize groups of observations. More recently, due to the emergence of high-dimensional datasets with a huge number of features, co-clustering techniques have emerged and proposed several methods for simultaneously producing groups of observations and features. By synthesizing the dataset in blocks (the crossing of a row-cluster and a column-cluster), this technique can sometimes summarize better the data and its inherent structure. The Latent Block Model (LBM) is a well-known method for performing a co-clustering. However, recently, contexts with features of different types (here called mixed type datasets) are becoming more common. Unfortunately, the LBM is not directly applicable on this kind of dataset. The present work extends the usual LBM to the so-called Multiple Latent Block Model (MLBM) which is able to handle mixed type datasets. The inference is done through a Stochastic EM-algorithm embedding a Gibbs sampler and model selection criterion is defined to choose the number of row and column clusters. This method was successfully used on simulated and real datasets.
Document type :
Preprints, Working Papers, ...
Complete list of metadatas

Cited literature [12 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01893457
Contributor : Margot Selosse <>
Submitted on : Thursday, October 11, 2018 - 2:23:39 PM
Last modification on : Friday, April 19, 2019 - 4:55:17 PM
Long-term archiving on : Saturday, January 12, 2019 - 2:23:14 PM

File

model-based-clustering.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01893457, version 1

Citation

Margot Selosse, Julien Jacques, Christophe Biernacki. Model-based co-clustering for mixed type data. 2018. ⟨hal-01893457⟩

Share

Metrics

Record views

225

Files downloads

111