SHCoClust, a Scalable Similarity-based Hierarchical Co-clustering Method and its Application to Textual Collections - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2017

SHCoClust, a Scalable Similarity-based Hierarchical Co-clustering Method and its Application to Textual Collections

Résumé

In comparison with flat clustering methods, such as K-means, hierarchical clustering and co-clustering methods are more advantageous, for the reason that hierarchical clustering is capable to reveal the internal connections of clusters, and co-clustering can yield clusters of data instances and features. Interested in organizing co-clusters in hierarchies and in discovering cluster hierarchies inside co-clusters, in this paper, we propose SHCoClust, a scalable similarity-based hierarchical co-clustering method. Except possessing the above-mentioned advantages in unison, SHCoClust is able to employ kernel functions, thanks to its utilization of inner product. Furthermore, having all similarities between 0 and 1, the input of SHCoClust can be sparsified by threshold values, so that less memory and less time are required for storage and for computation. This grants SHCoClust scalability, i.e, the ability to process relatively large datasets with reduced and limited computing resources. Our experiments demonstrate that SHCoClust significantly out-performs the conventional hierarchical clustering methods. In addition, with sparsifying the input similarity matrices obtained by linear kernel and by Gaussian kernel, SHCoClust is capable to guarantee the clustering quality, even when its input being largely sparsified. Consequently, up to 86% time gain and on average 75% memory gain are achieved.
Fichier principal
Vignette du fichier
PID4744355[camera_read_submissiom_2].pdf (981.98 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-01504986 , version 1 (12-04-2017)

Licence

Copyright (Tous droits réservés)

Identifiants

  • HAL Id : hal-01504986 , version 1

Citer

Xinyu Wang, Julien Ah-Pine, Jérôme Darmont. SHCoClust, a Scalable Similarity-based Hierarchical Co-clustering Method and its Application to Textual Collections. 2017. ⟨hal-01504986v1⟩
411 Consultations
153 Téléchargements

Partager

Gmail Facebook X LinkedIn More