Self-Organized Co-Clustering for textual data synthesis

Margot Selosse 1 Julien Jacques 1 Christophe Biernacki 2
2 MODAL - MOdel for Data Analysis and Learning
Inria Lille - Nord Europe, LPP - Laboratoire Paul Painlevé - UMR 8524, CERIM - Santé publique : épidémiologie et qualité des soins-EA 2694, Polytech Lille - École polytechnique universitaire de Lille, Université de Lille, Sciences et Technologies
Abstract : Recently, different studies have demonstrated the interest of co-clustering, which simultaneously produces row-clusters of observations and column-clusters of features. The present work introduces a novel co-clustering model for parsimoniously summarizing textual data in document-term format. In addition to highlighting homogeneous co-clusters-as other existing algorithms do-we also distinguish noisy co-clusters from significant ones, which is particularly useful for sparse document-term matrices. Furthermore, our model proposes a structure among the significant co-clusters and thus provides better interpretability for the user. The approach proposed competes with state-of-the-art methods for document and term clustering, and offers user-friendly results. The model relies on the Poisson distribution, and a constrained version of the Latent Block Model, which is a probabilistic approach for co-clustering. A Stochastic Expectation-Maximization algorithm is proposed to perform the model's inference as well as a model selection criterion to choose the number of co-clusters.
Document type :
Preprints, Working Papers, ...
Complete list of metadatas
Contributor : Margot Selosse <>
Submitted on : Tuesday, April 30, 2019 - 10:53:00 AM
Last modification on : Monday, May 13, 2019 - 3:51:44 PM


Files produced by the author(s)


  • HAL Id : hal-02115294, version 1


Margot Selosse, Julien Jacques, Christophe Biernacki. Self-Organized Co-Clustering for textual data synthesis. 2019. ⟨hal-02115294⟩



Record views


Files downloads