A penalized criterion for selecting the number of clusters for K-medians - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2022

A penalized criterion for selecting the number of clusters for K-medians

Résumé

Clustering is a usual unsupervised machine learning technique for grouping the data points into groups based upon similar features. We focus here on unsupervised clustering for contaminated data, i.e in the case where K-medians should be preferred to K-means because of its robustness. More precisely, we concentrate on a common question in clustering: how to chose the number of clusters? The answer proposed here is to consider the choice of the optimal number of clusters as the minimization of a risk function via penalization. In this paper, we obtain a suitable penalty shape for our criterion and derive an associated oracle-type inequality. Finally, the performance of this approach with different types of K-medians algorithms is compared on a simulation study with other popular techniques. All studied algorithms are available in the R package Kmedians on CRAN.
Fichier principal
Vignette du fichier
Kmedians.pdf (1.25 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03771959 , version 1 (07-09-2022)
hal-03771959 , version 2 (13-09-2022)
hal-03771959 , version 3 (26-02-2024)

Identifiants

Citer

Antoine Godichon-Baggioni, Sobihan Surendran. A penalized criterion for selecting the number of clusters for K-medians. 2022. ⟨hal-03771959v2⟩
99 Consultations
80 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More