KD-means: clustering method for massive data based on kd-tree

Nabil El Malki; Franck Ravat; Olivier Teste

Communication Dans Un Congrès Année : 2020

KD-means: clustering method for massive data based on kd-tree

(1, 2) , (1, 3) , (1, 4)

1
2
3
4

Nabil El Malki

Fonction : Auteur
PersonId : 1095087
IdRef : 24227143X

Systèmes d’Informations Généralisées

Capgemini [Toulouse]

Franck Ravat

Fonction : Auteur
PersonId : 735308
IdHAL : franck-ravat
ORCID : 0000-0003-4820-841X
IdRef : 127663711

Systèmes d’Informations Généralisées

Université Toulouse Capitole

Olivier Teste

Fonction : Auteur
PersonId : 117280
IdHAL : olivier-teste
ORCID : 0000-0003-0338-9886
IdRef : 076013286

Systèmes d’Informations Généralisées

Université Toulouse - Jean Jaurès

Résumé

K-means clustering is a popular unsupervised classification algorithm employed in several domains, e.g., imaging, segmentation, or compression. Nevertheless, the number of clusters k, fixed apriori, affects mainly the clustering quality. Current State-of-the-art k-means implementations could automatically set of the number of clusters. However, they result in unreasonable processing time while classifying large volumes of data. In this paper, we propose a novel solution based on kd-tree to determine the number of cluster k in the context of massive data for preprocessing data science projects or in near-real-time applications. We demonstrate how our solution outperforms current solutions in terms of clustering quality, and processing time on massive data.

Domaines

Base de données [cs.DB]

Fichier principal

paper7.pdf (1.04 Mo)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Franck Ravat : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03080514

Soumis le : jeudi 17 décembre 2020-17:34:50

Dernière modification le : mercredi 7 février 2024-16:05:48

Archivage à long terme le : jeudi 18 mars 2021-20:27:21

Dates et versions

hal-03080514 , version 1 (17-12-2020)

Licence

Paternité - Pas d'utilisation commerciale - Pas de modification

Identifiants

HAL Id : hal-03080514 , version 1

Citer

Nabil El Malki, Franck Ravat, Olivier Teste. KD-means: clustering method for massive data based on kd-tree. 22nd International Workshop On Design, Optimization, Languages and Analytical Processing of Big Data - DOLAP 2020 -, Mar 2020, Copenhagen, Denmark. ⟨hal-03080514⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-TLSE2 CNRS SMS UT1-CAPITOLE IRIT IRIT-SIG IUT-BLAGNAC IRIT-GD IRIT-UT1C IRIT-UT2J TOULOUSE-INP UNIV-UT3 UT3-TOULOUSEINP

145 Consultations

511 Téléchargements

KD-means: clustering method for massive data based on kd-tree

Résumé

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Partager