Regularized bi-directional co-clustering - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Statistics and Computing Année : 2021

Regularized bi-directional co-clustering

Séverine Affeldt
Lazhar Labiod
Mohamed Nadif

Résumé

The simultaneous clustering of documents and words, known as co-clustering, has proved to be more effective than one-sided clustering in dealing with sparse high-dimensional datasets. By their nature, text data are also generally unbalanced and directional. Recently, the von Mises–Fisher (vMF) mixture model was proposed to handle unbalanced data while harnessing the directional nature of text. In this paper, we propose a general co-clustering framework based on a matrix formulation of vMF model-based co-clustering. This formulation leads to a flexible framework for text co-clustering that can easily incorporate both word–word semantic relationships and document–document similarities. By contrast with existing methods, which generally use an additive incorporation of similarities, we propose a bi-directional multiplicative regularization that better encapsulates the underlying text data structure. Extensive evaluations on various real-world text datasets demonstrate the superior performance of our proposed approach over baseline and competitive methods, both in terms of clustering results and co-cluster topic coherence.
Fichier principal
Vignette du fichier
14-RegularizedBi-DirectionalCo-Clustering_StatisticsAndComputing_hal-03543057v1.pdf (1.61 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03543057 , version 1 (18-03-2022)

Identifiants

Citer

Séverine Affeldt, Lazhar Labiod, Mohamed Nadif. Regularized bi-directional co-clustering. Statistics and Computing, 2021, 31 (3), pp.32. ⟨10.1007/s11222-021-10006-w⟩. ⟨hal-03543057⟩
195 Consultations
122 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More