Model-Based Variable Decorrelation in Linear Regression

Clément Théry 1, 2 Christophe Biernacki 1, 2 Gaétan Loridant 3
2 MODAL - MOdel for Data Analysis and Learning
Inria Lille - Nord Europe, LPP - Laboratoire Paul Painlevé - UMR 8524, CERIM - Santé publique : épidémiologie et qualité des soins-EA 2694, Polytech Lille, Université de Lille 1, IUT’A
Abstract : Linear regression outcomes (estimates, prevision) are known to be damaged by highly correlated covariates. However most modern datasets are expected to mechanically convey more and more highly correlated covariates due to the global increase of the amount of variables they contain. We propose to explicitly model such correlations by a family of linear regressions between the covariates. The structure of correlations is found with an mcmc algorithm aiming at optimizing a specific bic criterion. This hierarchical-like approach leads to a joint probability distribution on both the initial response variable and the linearly explained covariates. Then, marginalisation on the linearly explained covariates produces a parsimonious correlation-free regression model from which classical procedures for estimating regression coefficient, including any variable selection procedures, can be plugged. Both simulated and real-life datasets from steel industry, where correlated variables are frequent, highlight that this proposed covariates pretreatment-like method has two essential benefits: First, it offers a real readability of the linear links between covariates; Second, it improves significantly efficiency of classical estimation/selection methods which are performed after. An r package (CorReg), available on the cran, implements this new method.
Type de document :
Pré-publication, Document de travail
Liste complète des métadonnées

Littérature citée [30 références]  Voir  Masquer  Télécharger
Contributeur : Christophe Biernacki <>
Soumis le : mercredi 31 décembre 2014 - 15:38:11
Dernière modification le : mercredi 25 avril 2018 - 14:23:16


Fichiers produits par l'(les) auteur(s)


  • HAL Id : hal-01099133, version 1



Clément Théry, Christophe Biernacki, Gaétan Loridant. Model-Based Variable Decorrelation in Linear Regression. 2014. 〈hal-01099133〉



Consultations de la notice


Téléchargements de fichiers