Accelerating one-pass clustering by cluster selection racing

Abstract : This paper introduces a racing mechanism in the cluster selection process for one-pass clustering algorithms. We focus on cases where data are not numerical vectors and where it is not necessarily possible to compute a mean for each cluster. In this case, the distance of each point to existing clusters can be computed exhaustively with a quadratic complexity which is not tractable in most of nowadays use cases. In this paper we first introduce a stochastic approach for estimating the distance of each new data point to existing clusters based on Hoeffding and Bernstein bounds, that reduces the number of computations by simultaneously selecting the quantity of data to be sampled and by eliminating the non-competitive clusters. Second, this paper shows that it is possible to improve the efficiency of our approach by reducing the theoretical values of the Hoeffding and Bernstein bounds. Our algorithms, tested on real data sets, provide significant acceleration of the one-pass clustering algorithms, while making less error (or any depending on parameters) than one-pass clustering algorithm with fixed number of comparisons with each cluster.
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01215197
Contributor : Lip6 Publications <>
Submitted on : Tuesday, October 13, 2015 - 4:51:04 PM
Last modification on : Thursday, March 21, 2019 - 12:59:11 PM

Identifiers

Citation

Nicolas Labroche, Marcin Detyniecki, Thomas Bärecke. Accelerating one-pass clustering by cluster selection racing. IEEE International Conference on Tools with Artificial Intelligence - ICTAI 2013, Nov 2013, Washington DC, United States. pp.491-498, ⟨10.1109/ICTAI.2013.79⟩. ⟨hal-01215197⟩

Share

Metrics

Record views

127