Estimation of the conditional risk in classification: the swapping method
Résumé
The bias of the empirical error rate in supervised classification is studied. It is shown that this bias can be understood as a covariance between the classification rule and the labeling of the training data. From this result, a new penalized criterion is proposed to perform model selection in classification. Applications of the resulting algorithm to simulated and real data are presented.