Decentralized K-means using randomized Gossip protocols for clustering large datasets

Jérôme Fellus 1 David Picard 1 Philippe-Henri Gosselin 2, 1
ETIS - Equipes Traitement de l'Information et Systèmes
2 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : In this paper, we consider the clustering of very large datasets distributed over a network of computational units using a decentralized K-means algorithm. To obtain the same codebook at each node of the network, we use a randomized gossip aggregation protocol where only small messages are ex- changed. We theoretically show the equivalence of the algorithm with a centralized K-means, provided a bound on the number of messages each node has to send is met. We provide experiments showing that the consensus is reached for a number of messages consistent with the bound, but also for a smaller number of messages, albeit with a less smooth evolution of the objective function.
Document type :
Conference papers
Complete list of metadatas

Cited literature [21 references]  Display  Hide  Download
Contributor : Philippe-Henri Gosselin <>
Submitted on : Monday, December 9, 2013 - 1:26:33 PM
Last modification on : Friday, November 16, 2018 - 1:22:49 AM
Long-term archiving on : Sunday, March 9, 2014 - 11:26:03 PM


Files produced by the author(s)


  • HAL Id : hal-00915822, version 1


Jérôme Fellus, David Picard, Philippe-Henri Gosselin. Decentralized K-means using randomized Gossip protocols for clustering large datasets. International Workshop on Knowledge Discovery Using Cloud and Distributed Computing Platforms, Dec 2013, Dallas, Texas, United States. pp.8. ⟨hal-00915822⟩



Record views


Files downloads