Skip to Main content Skip to Navigation
Journal articles

Efficient similarity-based data clustering by optimal object to cluster reallocation

Abstract : We present an iterative flat clustering algorithm designed to operate on arbitrary similarity matrices, with the only constraint that these matrices be symmetrical. Although functionally very close to kernel k-means, our proposal performs an maximization of average intra-class similarity, instead of a squared distance minimization, in order to remain closer to the semantics of similarities. We show that this approach allows relaxing the conditions on usable matrices, as well as opening better optimization possibilities. Systematic evaluation on a variety of data sets shows that the proposed approach outperforms or equals kernel k-means in a large majority of cases, while running much faster. Most notably, it significantly reduces memory access, which makes it a good choice for large data collections.
Document type :
Journal articles
Complete list of metadata

Cited literature [29 references]  Display  Hide  Download
Contributor : Mathieu Lagrange Connect in order to contact the contributor
Submitted on : Tuesday, June 26, 2018 - 2:58:44 PM
Last modification on : Friday, August 5, 2022 - 2:54:51 PM
Long-term archiving on: : Wednesday, September 26, 2018 - 10:01:26 PM


Publisher files allowed on an open archive


  • HAL Id : hal-01123756, version 2


Mathias Rossignol, Mathieu Lagrange, Arshia Cont. Efficient similarity-based data clustering by optimal object to cluster reallocation. PLoS ONE, Public Library of Science, 2018. ⟨hal-01123756v2⟩



Record views


Files downloads