Impact of contamination on training and test error rates in statistical clustering

Christel Ruwet; Gentiane Haesbroeck

doi:10.1080/03610918.2010.542847

Article Dans Une Revue Communications in Statistics - Simulation and Computation Année : 2011

Impact of contamination on training and test error rates in statistical clustering

(1) , (2)

1
2

Christel Ruwet

Fonction : Auteur correspondant
PersonId : 919669

Connectez-vous pour contacter l'auteur

Mathematics

Gentiane Haesbroeck

Fonction : Auteur
PersonId : 919670

Mathematics

Résumé

The k-means algorithm is one of the most common nonhierarchical clustering methods. However, this procedure is not robust with respect to atypical observations. Alternative techniques have thus been introduced, e.g. the generalized k-means procedure. In this paper, focus is on the error rate these procedures achieve when one expects the data to be distributed according to a mixture distribution. Two different definitions of the error rate are under consideration, depending on the data at hand.

Mots clés

Physical Sciences

Domaines

Calcul [stat.CO]

Fichier principal

PEER_stage2_10.1080%2F03610918.2010.542847.pdf (2.62 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Hal Peer : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00666340

Soumis le : samedi 4 février 2012-02:51:05

Dernière modification le : mercredi 9 novembre 2022-13:42:09

Archivage à long terme le : mercredi 14 décembre 2016-04:10:42

Dates et versions

hal-00666340 , version 1 (04-02-2012)

Identifiants

HAL Id : hal-00666340 , version 1
DOI : 10.1080/03610918.2010.542847

Citer

Christel Ruwet, Gentiane Haesbroeck. Impact of contamination on training and test error rates in statistical clustering. Communications in Statistics - Simulation and Computation, 2011, 40 (03), pp.394-411. ⟨10.1080/03610918.2010.542847⟩. ⟨hal-00666340⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

PEER

54 Consultations

97 Téléchargements

Impact of contamination on training and test error rates in statistical clustering

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager