Efficient similarity-based data clustering by optimal object to cluster reallocation

Abstract : We present an iterative flat clustering algorithm designed to operate on arbitrary similarity matrices, with the only constraint that these matrices be symmetrical. Although functionally very close to kernel k-means, our proposal performs an maximization of average intra-class similarity, instead of a squared distance minimization, in order to remain closer to the semantics of similarities. We show that this approach allows relaxing the conditions on usable matrices, as well as opening better optimization possibilities. Systematic evaluation on a variety of data sets shows that the proposed approach outperforms or equals kernel k-means in a large majority of cases, while running much faster. Most notably, it significantly reduces memory access, which makes it a good choice for large data collections.
Type de document :
Article dans une revue
PLoS ONE, Public Library of Science, 2018
Liste complète des métadonnées

Littérature citée [20 références]  Voir  Masquer  Télécharger

Contributeur : Mathieu Lagrange <>
Soumis le : mardi 26 juin 2018 - 14:58:44
Dernière modification le : jeudi 7 février 2019 - 14:48:22
Document(s) archivé(s) le : mercredi 26 septembre 2018 - 22:01:26


Fichiers éditeurs autorisés sur une archive ouverte


  • HAL Id : hal-01123756, version 2


Mathias Rossignol, Mathieu Lagrange, Arshia Cont. Efficient similarity-based data clustering by optimal object to cluster reallocation. PLoS ONE, Public Library of Science, 2018. 〈hal-01123756v2〉



Consultations de la notice


Téléchargements de fichiers