| HAL : hal-00347811, version 2 |
| arXiv : 0812.3141 |
| Fiche détaillée | Récupérer au format |
|
|
| Versions disponibles : | v1 (16-12-2008) | v2 (04-06-2010) |
|
|
|
|
| Choosing a penalty for model selection in heteroscedastic regression |
|
|
| Sylvain Arlot 1, 2 |
|
|
| (16/12/2008) |
|
|
| We consider the problem of choosing between several models in least-squares regression with heteroscedastic data. We prove that any penalization procedure is suboptimal when the penalty is a function of the dimension of the model, at least for some typical heteroscedastic model selection problems. In particular, Mallows' Cp is suboptimal in this framework. On the contrary, optimal model selection is possible with data-driven penalties such as resampling or $V$-fold penalties. Therefore, it is worth estimating the shape of the penalty from data, even at the price of a higher computational cost. Simulation experiments illustrate the existence of a trade-off between statistical accuracy and computational complexity. As a conclusion, we sketch some rules for choosing a penalty in least-squares regression, depending on what is known about possible variations of the noise-level. |
|
|
|
|
|
|
|
|
|
|
| 1 : | Laboratoire d'informatique de l'école normale supérieure (LIENS) |
| CNRS : UMR8548 – Ecole Normale Supérieure de Paris - ENS Paris | |
| 2 : | WILLOW (INRIA Rocquencourt) |
| INRIA – Ecole Normale Supérieure de Paris - ENS Paris – Ecole des Ponts ParisTech – CNRS : UMR8548 | |
|
|
|
|
|
|
|
|
| Willow |
|
|
|
|
| Domaine | : | Mathématiques/Statistiques Statistiques/Théorie |
|
|
| non-parametric regression – model selection – penalization – heteroscedastic data – Mallows Cp – resampling penalties |
|
|
|
|
| hal-00347811, version 2 | |
| http://hal.archives-ouvertes.fr/hal-00347811 | |
| oai:hal.archives-ouvertes.fr:hal-00347811 | |
| Contributeur : Sylvain Arlot | |
| Soumis le : Jeudi 3 Juin 2010, 19:24:45 | |
| Dernière modification le : Vendredi 4 Juin 2010, 10:33:42 | |