KD-means: clustering method for massive data based on kd-tree - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

KD-means: clustering method for massive data based on kd-tree

Résumé

K-means clustering is a popular unsupervised classification algorithm employed in several domains, e.g., imaging, segmentation, or compression. Nevertheless, the number of clusters k, fixed apriori, affects mainly the clustering quality. Current State-of-the-art k-means implementations could automatically set of the number of clusters. However, they result in unreasonable processing time while classifying large volumes of data. In this paper, we propose a novel solution based on kd-tree to determine the number of cluster k in the context of massive data for preprocessing data science projects or in near-real-time applications. We demonstrate how our solution outperforms current solutions in terms of clustering quality, and processing time on massive data.
Fichier principal
Vignette du fichier
paper7.pdf (1.04 Mo) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte

Dates et versions

hal-03080514 , version 1 (17-12-2020)

Licence

Paternité - Pas d'utilisation commerciale - Pas de modification

Identifiants

  • HAL Id : hal-03080514 , version 1

Citer

Nabil El Malki, Franck Ravat, Olivier Teste. KD-means: clustering method for massive data based on kd-tree. 22nd International Workshop On Design, Optimization, Languages and Analytical Processing of Big Data - DOLAP 2020 -, Mar 2020, Copenhagen, Denmark. ⟨hal-03080514⟩
145 Consultations
511 Téléchargements

Partager

Gmail Facebook X LinkedIn More