SHCoClust, a Scalable Similarity-based Hierarchical Co-clustering Method and its Application to Textual Collections

Abstract : In comparison with flat clustering methods, such as K-means, hierarchical clustering and co-clustering methods are more advantageous, for the reason that hierarchical clustering is capable to reveal the internal connections of clusters, and co-clustering can yield clusters of data instances and features. Interested in organizing co-clusters in hierarchies and in discovering cluster hierarchies inside co-clusters, in this paper, we propose SHCoClust, a scalable similarity-based hierarchical co-clustering method. Except possessing the above-mentioned advantages in unison, SHCoClust is able to employ kernel functions, thanks to its utilization of inner product. Furthermore, having all similarities between 0 and 1, the input of SHCoClust can be sparsified by threshold values, so that less memory and less time are required for storage and for computation. This grants SHCoClust scalability, i.e, the ability to process relatively large datasets with reduced and limited computing resources. Our experiments demonstrate that SHCoClust significantly out-performs the conventional hierarchical clustering methods. In addition, with sparsifying the input similarity matrices obtained by linear kernel and by Gaussian kernel, SHCoClust is capable to guarantee the clustering quality, even when its input being largely sparsified. Consequently, up to 86% time gain and on average 75% memory gain are achieved.
Type de document :
Pré-publication, Document de travail
This paper is accepted as a long paper with an oral presentation by the IEEE international confer.. 2017
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01504986
Contributeur : Xinyu Wang <>
Soumis le : mercredi 12 avril 2017 - 12:30:26
Dernière modification le : mardi 16 janvier 2018 - 15:49:36
Document(s) archivé(s) le : jeudi 13 juillet 2017 - 12:13:45

Fichier

PID4744355[camera_read_submiss...
Fichiers produits par l'(les) auteur(s)

Licence


Copyright (Tous droits réservés)

Identifiants

  • HAL Id : hal-01504986, version 1

Collections

Citation

Xinyu Wang, Julien Ah-Pine, Jérôme Darmont. SHCoClust, a Scalable Similarity-based Hierarchical Co-clustering Method and its Application to Textual Collections. This paper is accepted as a long paper with an oral presentation by the IEEE international confer.. 2017. 〈hal-01504986v1〉

Partager

Métriques

Consultations de la notice

81

Téléchargements de fichiers

55