Optimal cross-validation in density estimation - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2008

Optimal cross-validation in density estimation

Alain Celisse

Résumé

The performance of cross-validation (CV) is analyzed in two contexts: (i) risk estimation and (ii) model selection in the density estimation framework. The main focus is given to one CV algorithm called leave-$p$-out (Lpo), where $p$ denotes the cardinality of the test set. Closed-form expressions are settled for the Lpo estimator of the risk of projection estimators, which makes V-fold cross-validation completely useless. From a theoretical point of view, these closed-form expressions enable to study the Lpo performances in terms of risk estimation. For instance, the optimality of leave-one-out (Loo), that is Lpo with $p=1$, is proved among CV procedures. Two model selection frameworks are also considered: estimation, as opposed to identification. Unlike risk estimation, Loo is proved to be suboptimal as a model selection procedure. In the estimation framework with finite sample size $n$, optimality is achieved for $p$ large enough (with $p/n =o(1)$) to balance overfitting. A link is also identified between the optimal $p$ and the structure of the model collection. These theoretical results are strongly supported by simulation experiments. When performing identification, model consistency is also proved for Lpo with $p/n\to 1$ as $n\to +\infty$.
Fichier principal
Vignette du fichier
cvhistoAOS_HAL.pdf (484.79 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-00337058 , version 1 (05-11-2008)
hal-00337058 , version 2 (14-04-2009)
hal-00337058 , version 3 (30-03-2012)

Identifiants

Citer

Alain Celisse. Optimal cross-validation in density estimation. 2008. ⟨hal-00337058v3⟩
292 Consultations
212 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More