Oracle Inequalities and Optimal Inference under Group Sparsity

Abstract : We consider the problem of estimating a sparse linear regression vector $\beta^*$ under a gaussian noise model, for the purpose of both prediction and model selection. We assume that prior knowledge is available on the sparsity pattern, namely the set of variables is partitioned into prescribed groups, only few of which are relevant in the estimation process. This group sparsity assumption suggests us to consider the Group Lasso method as a means to estimate $\beta^*$. We establish oracle inequalities for the prediction and $\ell_2$ estimation errors of this estimator. These bounds hold under a restricted eigenvalue condition on the design matrix. Under a stronger coherence condition, we derive bounds for the estimation error for mixed $(2,p)$-norms with $1\le p\leq \infty$. When $p=\infty$, this result implies that a threshold version of the Group Lasso estimator selects the sparsity pattern of $\beta^*$ with high probability. Next, we prove that the rate of convergence of our upper bounds is optimal in a minimax sense, up to a logarithmic factor, for all estimators over a class of group sparse vectors. Furthermore, we establish lower bounds for the prediction and $\ell_2$ estimation errors of the usual Lasso estimator. Using this result, we demonstrate that the Group Lasso can achieve an improvement in the prediction and estimation properties as compared to the Lasso. An important application of our results is provided by the problem of estimating multiple regression equation simultaneously or multi-task learning. In this case, our result lead to refinements of the results in \cite{colt2009} and allow one to establish the quantitative advantage of the Group Lasso over the usual Lasso in the multi-task setting. Finally, within the same setting, we show how our results can be extended to more general noise distributions, of which we only require the fourth moment to be finite. To obtain this extension, we establish a new maximal moment inequality, which may be of independent interest.
Type de document :
Pré-publication, Document de travail
37 pages. 2010
Liste complète des métadonnées


https://hal.archives-ouvertes.fr/hal-00501509
Contributeur : Karim Lounici <>
Soumis le : lundi 12 juillet 2010 - 11:49:08
Dernière modification le : vendredi 28 avril 2017 - 01:07:57
Document(s) archivé(s) le : jeudi 14 octobre 2010 - 15:32:20

Fichier

lptv.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00501509, version 1

Collections

Citation

Karim Lounici, Alexandre Tsybakov, Massimiliano Pontil, Sara Van de Geer. Oracle Inequalities and Optimal Inference under Group Sparsity. 37 pages. 2010. <hal-00501509>

Partager

Métriques

Consultations de
la notice

421

Téléchargements du document

137