Variable clustering in high dimensional linear regression models - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2012

Variable clustering in high dimensional linear regression models

Résumé

For the last three decades, many scientific fields have known profound changes caused by the advent of technologies for massive data collection. What was first seen as a blessing, rapidely turned out to be termed as the curse of dimension. Reducing the dimension has therefore become a challenge in statistical learning. In high dimensional linear regression models, the quest for parsimony has long been driven by the idea that a few relevant variables may be sufficient to describe the modeled phenomenon. Recently, a new paradigm was introduced in a series of articles from which the present work derives. We propose here a model that simultaneously performs variables clustering and regression. Our approach no longer considers the regression coefficients as fixed parameters to be estimated, but as unobserved random variables following a Gaussian mixture model. The latent partition is then determined by maximum likelihood and predictions are obtained from the conditional distribution of the regression coefficients given the data. The number of latent components is chosen using a BIC criterion. Our model has very competitive predictive performances compared to standard approaches and brings significant improvements in interpretability.
Fichier principal
Vignette du fichier
CLERE-Yengo-Jacques-Biernacki-HAL.pdf (233.77 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-00764927 , version 1 (13-12-2012)
hal-00764927 , version 2 (02-08-2013)

Identifiants

  • HAL Id : hal-00764927 , version 1

Citer

Loïc Yengo, Julien Jacques, Christophe Biernacki. Variable clustering in high dimensional linear regression models. 2012. ⟨hal-00764927v1⟩
543 Consultations
811 Téléchargements

Partager

Gmail Facebook X LinkedIn More