DIDES: a fast and effective sampling for clustering algorithm

F. Ros; S. Guillaume

doi:10.1007/s10115-016-0946-8

Article Dans Une Revue Knowledge and Information Systems (KAIS) Année : 2017

DIDES: a fast and effective sampling for clustering algorithm

DIDES: un algorithme d'échantillonnage pour le clustering rapide et efficace

(1) , (2)

1
2

F. Ros

Fonction : Auteur

Université d'Orléans

S. Guillaume

Fonction : Auteur

Information – Technologies – Analyse Environnementale – Procédés Agricoles

Résumé

As clustering algorithms become more and more sophisticated to cope with current needs, large data sets of increasing complexity, sampling is likely to provide an interesting alternative. The proposal is a distance-based algorithm: the idea is to iteratively include in the sample the furthest item from all the already selected ones. Density is managed within a post-processing step, either low or high density areas are considered. The algorithm has some nice properties: insensitive to initialization, data size and noise, it is accurate according to the Rand index and avoids many distance calculations thanks to internal optimization. Moreover it is driven by only one, meaningful, parameter, called granularity, which impacts the sample size. Compared with concurrent approaches, it proved to be as powerful as the best known methods, with the lowest CPU cost.

Mots clés

ALGORITHM DENSITY SAMPLING

ECHANTILLONNAGE MATHEMATIQUES ALGORITHME DENSITE

Domaines

Sciences de l'environnement

Fichier principal

mo2017-pub00048098.pdf (1.89 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Import Ws Irstea : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01707857

Soumis le : mardi 13 février 2018-11:01:50

Dernière modification le : mercredi 13 mars 2024-03:23:44

Archivage à long terme le : mardi 8 mai 2018-03:03:41

Dates et versions

hal-01707857 , version 1 (13-02-2018)

Identifiants

HAL Id : hal-01707857 , version 1
DOI : 10.1007/s10115-016-0946-8
IRSTEA : PUB00048098

Citer

F. Ros, S. Guillaume. DIDES: a fast and effective sampling for clustering algorithm. Knowledge and Information Systems (KAIS), 2017, 50 (2), pp.543-568. ⟨10.1007/s10115-016-0946-8⟩. ⟨hal-01707857⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-ORLEANS IRSTEA AGROPOLIS ITAP INSTITUT-AGRO-MONTPELLIER INRAE INRAEOCCITANIEMONTPELLIER MATHNUM

95 Consultations

185 Téléchargements

DIDES: a fast and effective sampling for clustering algorithm

DIDES: un algorithme d'échantillonnage pour le clustering rapide et efficace

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager