Optimal cross-validation in density estimation

Alain Celisse

Pré-Publication, Document De Travail Année : 2008

Optimal cross-validation in density estimation

(1)

Alain Celisse

Fonction : Auteur
PersonId : 748170
IdHAL : alain-celisse

Laboratoire Paul Painlevé - UMR 8524

Résumé

The performance of cross-validation (CV) is analyzed in two contexts: (i) risk estimation and (ii) model selection in the density estimation framework. The main focus is given to one CV algorithm called leave-$p$-out (Lpo), where $p$ denotes the cardinality of the test set. Closed-form expressions are settled for the Lpo estimator of the risk of projection estimators, which makes V-fold cross-validation completely useless. From a theoretical point of view, these closed-form expressions enable to study the Lpo performances in terms of risk estimation. For instance, the optimality of leave-one-out (Loo), that is Lpo with $p=1$, is proved among CV procedures. Two model selection frameworks are also considered: estimation, as opposed to identification. Unlike risk estimation, Loo is proved to be suboptimal as a model selection procedure. In the estimation framework with finite sample size $n$, optimality is achieved for $p$ large enough (with $p/n =o(1)$) to balance overfitting. A link is also identified between the optimal $p$ and the structure of the model collection. These theoretical results are strongly supported by simulation experiments. When performing identification, model consistency is also proved for Lpo with $p/n\to 1$ as $n\to +\infty$.

Mots clés

density estimation oracle inequality projection estimators concentration inequalities Cross-validation leave-p-out resampling risk estimation model selection

Domaines

Statistiques [math.ST] Théorie [stat.TH]

Fichier principal

cvhistoAOS_HAL.pdf (484.79 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Alain Celisse : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00337058

Soumis le : vendredi 30 mars 2012-17:08:28

Dernière modification le : lundi 12 février 2024-15:38:10

Archivage à long terme le : mercredi 14 décembre 2016-19:16:42

Dates et versions

hal-00337058 , version 1 (05-11-2008)

hal-00337058 , version 2 (14-04-2009)

hal-00337058 , version 3 (30-03-2012)

Identifiants

HAL Id : hal-00337058 , version 3
ARXIV : 0811.0802

Citer

Alain Celisse. Optimal cross-validation in density estimation. 2008. ⟨hal-00337058v3⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INSMI UNIV-LILLE LPP-MATH

292 Consultations

212 Téléchargements

Optimal cross-validation in density estimation

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager