Efficient similarity-based data clustering by optimal object to cluster reallocation

Abstract : We present an iterative flat clustering algorithm designed to operate on arbitrary similarity matrices, with the only constraint that these matrices be symmetrical. Although functionally very close to kernel k-means, our proposal performs an maximization of average intra-class similarity, instead of a squared distance minimization, in order to remain closer to the semantics of similarities. We show that this approach allows relaxing the conditions on usable matrices, as well as opening better optimization possibilities. Systematic evaluation on a variety of data sets shows that the proposed approach outperforms or equals kernel k-means in a large majority of cases, while running much faster. Most notably, it significantly reduces memory access, which makes it a good choice for large data collections.
Type de document :
Article dans une revue
PLoS ONE, Public Library of Science, 2018
Liste complète des métadonnées

Littérature citée [20 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01123756
Contributeur : Mathieu Lagrange <>
Soumis le : mardi 26 juin 2018 - 14:58:44
Dernière modification le : vendredi 31 août 2018 - 09:18:09
Document(s) archivé(s) le : mercredi 26 septembre 2018 - 22:01:26

Fichier

rossignolKaveragesPlos.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

  • HAL Id : hal-01123756, version 2

Collections

Citation

Mathias Rossignol, Mathieu Lagrange, Arshia Cont. Efficient similarity-based data clustering by optimal object to cluster reallocation. PLoS ONE, Public Library of Science, 2018. 〈hal-01123756v2〉

Partager

Métriques

Consultations de la notice

74

Téléchargements de fichiers

42