Variable Clustering in High-Dimensional Linear Regression: The R Package clere

Loïc Yengo 1, 2 Julien Jacques 3, 1, 4 Christophe Biernacki 1, 4 Mickael Canouil 2
1 MODAL - MOdel for Data Analysis and Learning
Inria Lille - Nord Europe, LPP - Laboratoire Paul Painlevé - UMR 8524, CERIM - Santé publique : épidémiologie et qualité des soins-EA 2694, Polytech Lille - École polytechnique universitaire de Lille, Université de Lille, Sciences et Technologies
Abstract : Dimension reduction is one of the biggest challenge in high-dimensional regression models. We recently introduced a new methodology based on variable clustering as a means to reduce dimensionality. We introduce here an R package that implements two enhancements regarding the latter methodology. First, an improvement in computational time for estimating the parameters is presented. As a second enhancement, users of our method are now allowed to constrain the model to identify variables with weak or no effect on the response. An overview of the package functionalities as well as examples to run an analysis are described. Numerical experiments on simulated and real data were performed to illustrate the gain of computational time and the good predictive performance of our method compared to standard dimension reduction approaches.
Complete list of metadatas

Cited literature [23 references]  Display  Hide  Download
Contributor : Julien Jacques <>
Submitted on : Monday, February 3, 2014 - 10:45:04 AM
Last modification on : Thursday, February 21, 2019 - 10:34:08 AM
Long-term archiving on : Sunday, April 9, 2017 - 5:53:00 AM


Files produced by the author(s)


  • HAL Id : hal-00940929, version 1


Loïc Yengo, Julien Jacques, Christophe Biernacki, Mickael Canouil. Variable Clustering in High-Dimensional Linear Regression: The R Package clere. The R Journal, R Foundation for Statistical Computing, 2016, 8 (1), pp.92-106. ⟨hal-00940929⟩



Record views


Files downloads