Regularized bi-directional co-clustering
Résumé
The simultaneous clustering of documents and words, known as co-clustering, has proved to be more effective than one-sided
clustering in dealing with sparse high-dimensional datasets. By their nature, text data are also generally unbalanced and
directional. Recently, the von Mises–Fisher (vMF) mixture model was proposed to handle unbalanced data while harnessing
the directional nature of text. In this paper, we propose a general co-clustering framework based on a matrix formulation
of vMF model-based co-clustering. This formulation leads to a flexible framework for text co-clustering that can easily
incorporate both word–word semantic relationships and document–document similarities. By contrast with existing methods,
which generally use an additive incorporation of similarities, we propose a bi-directional multiplicative regularization that
better encapsulates the underlying text data structure. Extensive evaluations on various real-world text datasets demonstrate
the superior performance of our proposed approach over baseline and competitive methods, both in terms of clustering results
and co-cluster topic coherence.
Domaines
Informatique [cs]
Fichier principal
14-RegularizedBi-DirectionalCo-Clustering_StatisticsAndComputing_hal-03543057v1.pdf (1.61 Mo)
Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)