Efficient similarity-based data clustering by optimal object to cluster reallocation

Abstract : We present an iterative flat clustering algorithm designed to operate on arbitrary similarity matrices, with the only constraint that these matrices be symmetrical. Although functionally very close to kernel k-means, our proposal performs an maximization of average intra-class similarity, instead of a squared distance minimization, in order to remain closer to the semantics of similarities. We show that this approach allows relaxing the conditions on usable matrices, as well as opening better optimization possibilities. Systematic evaluation on a variety of data sets shows that the proposed approach outperforms or equals kernel k-means in a large majority of cases, while running much faster. Most notably, it significantly reduces memory access, which makes it a good choice for large data collections.
Document type :
Journal articles
Liste complète des métadonnées

Cited literature [20 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01123756
Contributor : Mathieu Lagrange <>
Submitted on : Tuesday, June 26, 2018 - 2:58:44 PM
Last modification on : Saturday, March 23, 2019 - 1:39:32 AM
Document(s) archivé(s) le : Wednesday, September 26, 2018 - 10:01:26 PM

File

rossignolKaveragesPlos.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-01123756, version 2

Citation

Mathias Rossignol, Mathieu Lagrange, Arshia Cont. Efficient similarity-based data clustering by optimal object to cluster reallocation. PLoS ONE, Public Library of Science, 2018. ⟨hal-01123756v2⟩

Share

Metrics

Record views

125

Files downloads

105