Regularized Dual-PPMI Co-clustering for Text Data - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Regularized Dual-PPMI Co-clustering for Text Data

Séverine Affeldt
Lazhar Labiod
Mohamed Nadif

Résumé

Co-clustering of document-term matrices has proved to be more effective than one-sided clustering. By their nature, text data are also generally unbalanced and directional. Recently, the von Mises-Fisher (vMF) mixture model was proposed to handle unbalanced data while harnessing the directional nature of text. In this paper we propose a novel co-clustering approach based on a matrix formulation of vMF model-based co-clustering. This formulation leads to a flexible method for text co-clustering that can easily incorporate both word-word semantic relationships and document-document similarities. By contrast with existing methods, which generally use an additive incorporation of similarities, we propose a dual multiplicative regularization that better encapsulates the underlying text data structure. Extensive evaluations on various real-world text datasets demonstrate the superior performance of our proposed approach over baseline and competitive methods, both in terms of clustering results and co-cluster topic coherence.
Fichier non déposé

Dates et versions

hal-03538921 , version 1 (18-03-2022)

Identifiants

Citer

Séverine Affeldt, Lazhar Labiod, Mohamed Nadif. Regularized Dual-PPMI Co-clustering for Text Data. SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul 2021, Virtual Event, Canada. pp.2263-2267, ⟨10.1145/3404835.3463065⟩. ⟨hal-03538921⟩
102 Consultations
4 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More