Skip to Main content Skip to Navigation
Journal articles

Model-based co-clustering for mixed type data

Margot Selosse 1, 2 Julien Jacques 1, 2 Christophe Biernacki 3
3 MODAL - MOdel for Data Analysis and Learning
Inria Lille - Nord Europe, LPP - Laboratoire Paul Painlevé - UMR 8524, METRICS - Evaluation des technologies de santé et des pratiques médicales - ULR 2694, Polytech Lille - École polytechnique universitaire de Lille, Université de Lille, Sciences et Technologies
Abstract : The importance of clustering for creating groups of observations is well known. The emergence of high-dimensional data sets with a huge number of features leads to co-clustering techniques, and several methods have been developed for simultaneously producing groups of observations and features. By grouping the data set into blocks (the crossing of a row-cluster and a column-cluster), these techniques can sometimes better summarize the data set and its inherent structure. The Latent Block Model (LBM) is a well-known method for performing co-clustering. However, recently, contexts with features of different types (here called mixed type data sets) are becoming more common. The LBM is not directly applicable to this kind of data set. Here a natural extension of the usual LBM to the ``Multiple Latent Block Model" (MLBM) is proposed in order to handle mixed type data sets. Inference is performed using a Stochastic EM-algorithm that embeds a Gibbs sampler, and allows for missing data situations. A model selection criterion is defined to choose the number of row and column clusters. The method is then applied to both simulated and real data sets.
Document type :
Journal articles
Complete list of metadata

Cited literature [42 references]  Display  Hide  Download
Contributor : Margot Selosse <>
Submitted on : Friday, October 11, 2019 - 4:16:15 PM
Last modification on : Friday, November 27, 2020 - 2:18:03 PM


Files produced by the author(s)




Margot Selosse, Julien Jacques, Christophe Biernacki. Model-based co-clustering for mixed type data. Computational Statistics and Data Analysis, Elsevier, 2020, 144, pp.106866. ⟨10.1016/j.csda.2019.106866⟩. ⟨hal-01893457v2⟩



Record views


Files downloads