# Model-based co-clustering for mixed type data

3 MODAL - MOdel for Data Analysis and Learning
Inria Lille - Nord Europe, LPP - Laboratoire Paul Painlevé - UMR 8524, METRICS - Evaluation des technologies de santé et des pratiques médicales - ULR 2694, Polytech Lille - École polytechnique universitaire de Lille, Université de Lille, Sciences et Technologies
Abstract : The importance of clustering for creating groups of observations is well known. The emergence of high-dimensional data sets with a huge number of features leads to co-clustering techniques, and several methods have been developed for simultaneously producing groups of observations and features. By grouping the data set into blocks (the crossing of a row-cluster and a column-cluster), these techniques can sometimes better summarize the data set and its inherent structure. The Latent Block Model (LBM) is a well-known method for performing co-clustering. However, recently, contexts with features of different types (here called mixed type data sets) are becoming more common. The LBM is not directly applicable to this kind of data set. Here a natural extension of the usual LBM to the Multiple Latent Block Model" (MLBM) is proposed in order to handle mixed type data sets. Inference is performed using a Stochastic EM-algorithm that embeds a Gibbs sampler, and allows for missing data situations. A model selection criterion is defined to choose the number of row and column clusters. The method is then applied to both simulated and real data sets.
Keywords :
Document type :
Journal articles
Domain :

Cited literature [42 references]

https://hal.archives-ouvertes.fr/hal-01893457
Contributor : Margot Selosse <>
Submitted on : Friday, October 11, 2019 - 4:16:15 PM
Last modification on : Friday, November 27, 2020 - 2:18:03 PM

### File

manuscript.pdf
Files produced by the author(s)

### Citation

Margot Selosse, Julien Jacques, Christophe Biernacki. Model-based co-clustering for mixed type data. Computational Statistics and Data Analysis, Elsevier, 2020, 144, pp.106866. ⟨10.1016/j.csda.2019.106866⟩. ⟨hal-01893457v2⟩

Record views