Impact of contamination on training and test error rates in statistical clustering - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Communications in Statistics - Simulation and Computation Année : 2011

Impact of contamination on training and test error rates in statistical clustering

Christel Ruwet
  • Fonction : Auteur correspondant
  • PersonId : 919669

Connectez-vous pour contacter l'auteur
Gentiane Haesbroeck
  • Fonction : Auteur
  • PersonId : 919670

Résumé

The k-means algorithm is one of the most common nonhierarchical clustering methods. However, this procedure is not robust with respect to atypical observations. Alternative techniques have thus been introduced, e.g. the generalized k-means procedure. In this paper, focus is on the error rate these procedures achieve when one expects the data to be distributed according to a mixture distribution. Two different definitions of the error rate are under consideration, depending on the data at hand.

Mots clés

Domaines

Calcul [stat.CO]
Fichier principal
Vignette du fichier
PEER_stage2_10.1080%2F03610918.2010.542847.pdf (2.62 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00666340 , version 1 (04-02-2012)

Identifiants

Citer

Christel Ruwet, Gentiane Haesbroeck. Impact of contamination on training and test error rates in statistical clustering. Communications in Statistics - Simulation and Computation, 2011, 40 (03), pp.394-411. ⟨10.1080/03610918.2010.542847⟩. ⟨hal-00666340⟩

Collections

PEER
54 Consultations
97 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More