Dirichlet Process Mixture Models made Scalable and Effective by means of Massive Distribution

Khadidja Meguelati 1 Bénédicte Fontez 2 Nadine Hilgert 2 Florent Masseglia 1
1 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : Clustering with accurate results have become a topic of high interest. Dirichlet Process Mixture (DPM) is a model used for clustering with the advantage of discovering the number of clusters automatically and offering nice properties like, e.g., its potential convergence to the actual clusters in the data. These advantages come at the price of prohibitive response times, which impairs its adoption and makes centralized DPM approaches inefficient. We propose DC-DPM, a parallel clustering solution that gracefully scales to millions of data points while remaining DPM compliant, which is the challenge of distributing this process. Our experiments, on both synthetic and real world data, illustrate the high performance of our approach on millions of data points. The centralized algorithm does not scale and has its limit on 100K data points, where it needs more than 7 hours. In this case, our approach needs less than 30 seconds.
Type de document :
Communication dans un congrès
SAC: Symposium on Applied Computing, Apr 2019, Limassol, Cyprus. 34th ACM/SIGAPP Symposium On Applied Computing, 2019, 〈https://www.sigapp.org/sac/sac2019/〉. 〈10.1145/3297280.3297327〉
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01999453
Contributeur : Florent Masseglia <>
Soumis le : mercredi 30 janvier 2019 - 10:09:48
Dernière modification le : jeudi 7 février 2019 - 16:32:11

Fichier

ACM_SigConf_SAC2019.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Khadidja Meguelati, Bénédicte Fontez, Nadine Hilgert, Florent Masseglia. Dirichlet Process Mixture Models made Scalable and Effective by means of Massive Distribution. SAC: Symposium on Applied Computing, Apr 2019, Limassol, Cyprus. 34th ACM/SIGAPP Symposium On Applied Computing, 2019, 〈https://www.sigapp.org/sac/sac2019/〉. 〈10.1145/3297280.3297327〉. 〈hal-01999453〉

Partager

Métriques

Consultations de la notice

23

Téléchargements de fichiers

23