Impact of contamination on training and test error rates in statistical clustering
Résumé
The k-means algorithm is one of the most common nonhierarchical clustering methods. However, this procedure is not robust with respect to atypical observations. Alternative techniques have thus been introduced, e.g. the generalized k-means procedure. In this paper, focus is on the error rate these procedures achieve when one expects the data to be distributed according to a mixture distribution. Two different definitions of the error rate are under consideration, depending on the data at hand.
Domaines
Calcul [stat.CO]
Origine : Fichiers produits par l'(les) auteur(s)
Loading...