DENDIS: a new density-based sampling for clustering algorithm - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Expert Systems with Applications Année : 2016

DENDIS: a new density-based sampling for clustering algorithm

DENDIS: un nouvel algorithme d'échantillonnage pour le clustering basé sur la densité

Résumé

To deal with large datasets, sampling can be used as a preprocessing step for clustering. In this paper, an hybrid sampling algorithm is proposed. It is density-based while managing distance concepts to ensure space coverage and fit cluster shapes. At each step a new item is added to the sample: it is chosen as the furthest from the representative in the most important group. A constraint on the hyper volume induced by the samples avoids over sampling in high density areas. The inner structure allows for internal optimization: only a few distances have to be computed. The algorithm behavior is investigated using synthetic and real-world data sets and compared to alternative approaches, at conceptual and empirical levels. The numerical experiments proved it is more parsimonious, faster and more accurate, according to the Rand Index, with both k-means and hierarchical clustering algorithms.
Fichier principal
Vignette du fichier
mo2016-pub00048097.pdf (1.51 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01868203 , version 1 (05-09-2018)

Identifiants

Citer

F. Ros, S. Guillaume. DENDIS: a new density-based sampling for clustering algorithm. Expert Systems with Applications, 2016, 56, pp.349-359. ⟨10.1016/j.eswa.2016.03.008⟩. ⟨hal-01868203⟩
77 Consultations
232 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More