Decentralized K-means using randomized Gossip protocols for clustering large datasets

Jérôme Fellus; David Picard; Philippe-Henri Gosselin

Communication Dans Un Congrès Année : 2013

Decentralized K-means using randomized Gossip protocols for clustering large datasets

(1) , (1) , (2, 1)

1
2

Jérôme Fellus

Fonction : Auteur

Multimedia Indexation and Data Integration

David Picard

Fonction : Auteur
PersonId : 741
IdHAL : david-picard
ORCID : 0000-0002-6296-4222
IdRef : 133005216

Multimedia Indexation and Data Integration

Philippe-Henri Gosselin

Fonction : Auteur
PersonId : 9334
IdHAL : philippe-henri-gosselin
IdRef : 106963392

Multimedia content-based indexing

Multimedia Indexation and Data Integration

Résumé

In this paper, we consider the clustering of very large datasets distributed over a network of computational units using a decentralized K-means algorithm. To obtain the same codebook at each node of the network, we use a randomized gossip aggregation protocol where only small messages are ex- changed. We theoretically show the equivalence of the algorithm with a centralized K-means, provided a bound on the number of messages each node has to send is met. We provide experiments showing that the consensus is reached for a number of messages consistent with the bound, but also for a smaller number of messages, albeit with a less smooth evolution of the objective function.

Domaines

Machine Learning [stat.ML]

Fichier principal

fellus13kdcloud.pdf (385.5 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Philippe-Henri Gosselin : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00915822

Soumis le : lundi 9 décembre 2013-13:26:33

Dernière modification le : vendredi 24 mars 2023-14:52:58

Archivage à long terme le : dimanche 9 mars 2014-23:26:03

Dates et versions

hal-00915822 , version 1 (09-12-2013)

Identifiants

HAL Id : hal-00915822 , version 1

Citer

Jérôme Fellus, David Picard, Philippe-Henri Gosselin. Decentralized K-means using randomized Gossip protocols for clustering large datasets. International Workshop on Knowledge Discovery Using Cloud and Distributed Computing Platforms, Dec 2013, Dallas, Texas, United States. pp.8. ⟨hal-00915822⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

EC-PARIS UNIV-RENNES1 CNRS INRIA UNIV-CERGY INSA-RENNES IRISA ETIS ETIS-MIDI IRISA-D6 INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES INSA-GROUPE CY-TECH-SM UR1-MATH-NUM

479 Consultations

469 Téléchargements

Decentralized K-means using randomized Gossip protocols for clustering large datasets

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager