Regularized bi-directional co-clustering

Séverine Affeldt; Lazhar Labiod; Mohamed Nadif

doi:10.1007/s11222-021-10006-w

Article Dans Une Revue Statistics and Computing Année : 2021

Regularized bi-directional co-clustering

(1) , (1) , (1)

Séverine Affeldt

Fonction : Auteur
PersonId : 754923
IdHAL : severine-affeldt
ORCID : 0000-0002-4107-0887
IdRef : 18852486X

CB - Centre Borelli - UMR 9010

Lazhar Labiod

Fonction : Auteur
PersonId : 753798
IdHAL : labiod-lazhar
ORCID : 0000-0001-8641-8050
IdRef : 136329748

CB - Centre Borelli - UMR 9010

Mohamed Nadif

Fonction : Auteur
PersonId : 761227
ORCID : 0000-0002-0007-3950
IdRef : 139245286

CB - Centre Borelli - UMR 9010

Résumé

The simultaneous clustering of documents and words, known as co-clustering, has proved to be more effective than one-sided clustering in dealing with sparse high-dimensional datasets. By their nature, text data are also generally unbalanced and directional. Recently, the von Mises–Fisher (vMF) mixture model was proposed to handle unbalanced data while harnessing the directional nature of text. In this paper, we propose a general co-clustering framework based on a matrix formulation of vMF model-based co-clustering. This formulation leads to a flexible framework for text co-clustering that can easily incorporate both word–word semantic relationships and document–document similarities. By contrast with existing methods, which generally use an additive incorporation of similarities, we propose a bi-directional multiplicative regularization that better encapsulates the underlying text data structure. Extensive evaluations on various real-world text datasets demonstrate the superior performance of our proposed approach over baseline and competitive methods, both in terms of clustering results and co-cluster topic coherence.

Mots clés

Co-clustering Regularization Information retrieval Text mining

Domaines

Informatique [cs]

Fichier principal

14-RegularizedBi-DirectionalCo-Clustering_StatisticsAndComputing_hal-03543057v1.pdf (1.61 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Equipe HAL Université Paris Cité : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03543057

Soumis le : vendredi 18 mars 2022-13:06:06

Dernière modification le : vendredi 26 avril 2024-13:54:49

Archivage à long terme le : dimanche 19 juin 2022-18:55:12

Dates et versions

hal-03543057 , version 1 (18-03-2022)

Identifiants

HAL Id : hal-03543057 , version 1
DOI : 10.1007/s11222-021-10006-w

Citer

Séverine Affeldt, Lazhar Labiod, Mohamed Nadif. Regularized bi-directional co-clustering. Statistics and Computing, 2021, 31 (3), pp.32. ⟨10.1007/s11222-021-10006-w⟩. ⟨hal-03543057⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSERM SSA CNRS ENS-CACHAN INSMI UNIV-PARIS-SACLAY UP-SCIENCES ANR ENS-PARIS-SACLAY CB_UMR9010 GS-MATHEMATIQUES

195 Consultations

122 Téléchargements

Regularized bi-directional co-clustering

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager