Decentralized K-means using randomized Gossip protocols for clustering large datasets

Jérôme Fellus 1 David Picard 1 Philippe-Henri Gosselin 2, 1
1 MIDI
ETIS - Equipes Traitement de l'Information et Systèmes
2 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : In this paper, we consider the clustering of very large datasets distributed over a network of computational units using a decentralized K-means algorithm. To obtain the same codebook at each node of the network, we use a randomized gossip aggregation protocol where only small messages are ex- changed. We theoretically show the equivalence of the algorithm with a centralized K-means, provided a bound on the number of messages each node has to send is met. We provide experiments showing that the consensus is reached for a number of messages consistent with the bound, but also for a smaller number of messages, albeit with a less smooth evolution of the objective function.
Document type :
Conference papers
Complete list of metadatas

Cited literature [21 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00915822
Contributor : Philippe-Henri Gosselin <>
Submitted on : Monday, December 9, 2013 - 1:26:33 PM
Last modification on : Friday, November 16, 2018 - 1:22:49 AM
Long-term archiving on : Sunday, March 9, 2014 - 11:26:03 PM

File

fellus13kdcloud.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00915822, version 1

Citation

Jérôme Fellus, David Picard, Philippe-Henri Gosselin. Decentralized K-means using randomized Gossip protocols for clustering large datasets. International Workshop on Knowledge Discovery Using Cloud and Distributed Computing Platforms, Dec 2013, Dallas, Texas, United States. pp.8. ⟨hal-00915822⟩

Share

Metrics

Record views

743

Files downloads

436