Choice of V for V-Fold Cross-Validation in Least-Squares Density Estimation

Sylvain Arlot 1, 2, 3 Matthieu Lerasle 4
2 SIERRA - Statistical Machine Learning and Parsimony
DI-ENS - Département d'informatique de l'École normale supérieure, ENS Paris - École normale supérieure - Paris, Inria Paris-Rocquencourt, CNRS - Centre National de la Recherche Scientifique : UMR8548
Abstract : This paper studies V-fold cross-validation for model selection in least-squares density estimation. The goal is to provide theoretical grounds for choosing V in order to minimize the least-squares loss of the selected estimator. We first prove a non-asymptotic oracle inequality for V-fold cross-validation and its bias-corrected version (V-fold penalization). In particular, this result implies that V-fold penalization is asymptotically optimal in the nonparametric case. Then, we compute the variance of V-fold cross-validation and related criteria, as well as the variance of key quantities for model selection performance. We show that these variances depend on V like 1+4/(V-1), at least in some particular cases, suggesting that the performance increases much from V=2 to V=5 or 10, and then is almost constant. Overall, this can explain the common advice to take V=5---at least in our setting and when the computational power is limited---, as supported by some simulation experiments. An oracle inequality and exact formulas for the variance are also proved for Monte-Carlo cross-validation, also known as repeated cross-validation, where the parameter V is replaced by the number B of random splits of the data.
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-00743931
Contributeur : Sylvain Arlot <>
Soumis le : vendredi 9 octobre 2015 - 22:02:08
Dernière modification le : vendredi 12 janvier 2018 - 01:56:00
Document(s) archivé(s) le : dimanche 10 janvier 2016 - 10:42:40

Fichiers

penvfreech8_hal.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00743931, version 3
  • ARXIV : 1210.5830

Citation

Sylvain Arlot, Matthieu Lerasle. Choice of V for V-Fold Cross-Validation in Least-Squares Density Estimation. 2015. 〈hal-00743931v3〉

Partager

Métriques

Consultations de la notice

492

Téléchargements de fichiers

141