Model-Based Variable Decorrelation in Linear Regression

Clément Théry 1, 2 Christophe Biernacki 1, 2 Gaétan Loridant 3
2 MODAL - MOdel for Data Analysis and Learning
LPP - Laboratoire Paul Painlevé - UMR 8524, Inria Lille - Nord Europe, CERIM - Santé publique : épidémiologie et qualité des soins-EA 2694, Polytech Lille, Université de Lille 1, IUT’A
Abstract : Linear regression outcomes (estimates, prevision) are known to be damaged by highly correlated covariates. However most modern datasets are expected to mechanically convey more and more highly correlated covariates due to the global increase of the amount of variables they contain. We propose to explicitly model such correlations by a family of linear regressions between the covariates. The structure of correlations is found with an mcmc algorithm aiming at optimizing a specific bic criterion. This hierarchical-like approach leads to a joint probability distribution on both the initial response variable and the linearly explained covariates. Then, marginalisation on the linearly explained covariates produces a parsimonious correlation-free regression model from which classical procedures for estimating regression coefficient, including any variable selection procedures, can be plugged. Both simulated and real-life datasets from steel industry, where correlated variables are frequent, highlight that this proposed covariates pretreatment-like method has two essential benefits: First, it offers a real readability of the linear links between covariates; Second, it improves significantly efficiency of classical estimation/selection methods which are performed after. An r package (CorReg), available on the cran, implements this new method.
Type de document :
Pré-publication, Document de travail
2014
Liste complète des métadonnées


https://hal.archives-ouvertes.fr/hal-01099133
Contributeur : Christophe Biernacki <>
Soumis le : mercredi 31 décembre 2014 - 15:38:11
Dernière modification le : mercredi 14 décembre 2016 - 01:07:56

Fichier

article.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01099133, version 1

Collections

Citation

Clément Théry, Christophe Biernacki, Gaétan Loridant. Model-Based Variable Decorrelation in Linear Regression. 2014. <hal-01099133>

Partager

Métriques

Consultations de
la notice

242

Téléchargements du document

219