Why V=5 is enough in V-fold cross-validation

Sylvain Arlot; Matthieu Lerasle

Pré-Publication, Document De Travail Année : 2014

Why V=5 is enough in V-fold cross-validation

(1, 2) , (3)

1
2
3

Sylvain Arlot

Fonction : Auteur
PersonId : 1608
IdHAL : sylvain-arlot
IdRef : 124609589

Département d'informatique - ENS Paris

Statistical Machine Learning and Parsimony

Matthieu Lerasle

Fonction : Auteur
PersonId : 7546
IdHAL : matthieu-lerasle
IdRef : 137605803

Laboratoire Jean Alexandre Dieudonné

Résumé

This paper studies V-fold cross-validation for model selection in least-squares density estimation. The goal is to provide theoretical grounds for choosing V in order to minimize the least-squares loss of the selected estimator. We first prove a non asymptotic oracle inequality for V-fold cross-validation and its bias-corrected version (V-fold penalization). In particular, this result implies V-fold penalization is asymptotically optimal. Then, we compute the variance of V-fold cross-validation and related criteria, as well as the variance of key quantities for model selection performance. We show these variances depend on V like 1+4/(V-1) (at least in some particular cases), suggesting the performance increases much from V=2 to V=5 or 10, and then is almost constant. Overall, this explains the common advice to take V=5---at least in our setting and when the computational power is limited---, as confirmed by some simulation experiments.

Mots clés

V-fold cross-validation leave-one-out leave-p-out resampling penalties density estimation model selection penalization

Domaines

Statistiques [math.ST] Théorie [stat.TH] Apprentissage [cs.LG] Machine Learning [stat.ML] Méthodologie [stat.ME]

Fichier principal

penvfreech6.pdf (1.18 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Sylvain Arlot : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00743931

Soumis le : jeudi 17 juillet 2014-13:13:09

Dernière modification le : vendredi 19 avril 2024-16:18:56

Archivage à long terme le : lundi 24 novembre 2014-17:36:25

Dates et versions

hal-00743931 , version 1 (22-10-2012)

hal-00743931 , version 2 (17-07-2014)

hal-00743931 , version 3 (09-10-2015)

Identifiants

HAL Id : hal-00743931 , version 2
ARXIV : 1210.5830

Citer

Sylvain Arlot, Matthieu Lerasle. Why V=5 is enough in V-fold cross-validation. 2014. ⟨hal-00743931v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

803 Consultations

495 Téléchargements

Why V=5 is enough in V-fold cross-validation

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Altmetric

Partager