Skip to Main content Skip to Navigation
Conference papers

Dirichlet Process Mixture Models made Scalable and Effective by means of Massive Distribution

Khadidja Meguelati 1 Bénédicte Fontez 2 Nadine Hilgert 2 Florent Masseglia 1
1 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : Clustering with accurate results have become a topic of high interest.Dirichlet Process Mixture (DPM) is a model used for clustering withthe advantage of discovering the number of clusters automaticallyand offering nice properties like,e.g., its potential convergence tothe actual clusters in the data. These advantages come at the priceof prohibitive response times, which impairs its adoption and makescentralized DPM approaches inefficient. We propose DC-DPM, aparallel clustering solution that gracefully scales to millions of datapoints while remaining DPM compliant, which is the challenge ofdistributing this process. Our experiments, on both synthetic andreal world data, illustrate the high performance of our approach onmillions of data points. The centralized algorithm does not scale andhas its limit on 100K data points, where it needs more than 7 hours.In this case, our approach needs less than 30 seconds.
Complete list of metadatas

Cited literature [27 references]  Display  Hide  Download
Contributor : Florent Masseglia <>
Submitted on : Wednesday, January 30, 2019 - 10:09:48 AM
Last modification on : Thursday, July 2, 2020 - 2:14:19 PM


Files produced by the author(s)



Khadidja Meguelati, Bénédicte Fontez, Nadine Hilgert, Florent Masseglia. Dirichlet Process Mixture Models made Scalable and Effective by means of Massive Distribution. SAC 2019 - 34th Symposium On Applied Computing, Apr 2019, Limassol, Cyprus. pp.502-509, ⟨10.1145/3297280.3297327⟩. ⟨hal-01999453⟩



Record views


Files downloads