Determining the k in k-means with MapReduce - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2014

Determining the k in k-means with MapReduce

Pietro Michiardi
  • Fonction : Auteur
  • PersonId : 1084771
Wim Mees
  • Fonction : Auteur
  • PersonId : 1008623
Olivier Thonnard
  • Fonction : Auteur
  • PersonId : 965107

Résumé

In this paper we propose a MapReduce implementation of G-means, a variant of k-means that is able to automatically determine k, the number of clusters. We show that our implementation scales to very large datasets and very large values of k, as the computation cost is proportional to nk. Other techniques that run a clustering algorithm with different values of k and choose the value of k that provides the " best " results have a computation cost that is proportional to nk 2. We run experiments that confirm that the processing time is proportional to k. These experiments also show that, because G-means adds new centers progressively, if and where they are needed, it reduces the probability to fall into a local minimum, and finally finds better centers than classical k-means processing.
Fichier principal
Vignette du fichier
mrandbeyond2013_debatty.pdf (521.59 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01525708 , version 1 (30-05-2017)

Identifiants

  • HAL Id : hal-01525708 , version 1

Citer

Thibault Debatty, Pietro Michiardi, Wim Mees, Olivier Thonnard. Determining the k in k-means with MapReduce. EDBT/ICDT 2014 Joint Conference, Mar 2014, Athènes, Greece. ⟨hal-01525708⟩

Collections

EURECOM
194 Consultations
1050 Téléchargements

Partager

Gmail Facebook X LinkedIn More