Significance testing for variable selection in high-dimension

Abstract : Assessing the uncertainty pertaining to the conclusions derived from experimental data is challenging when there is a high number of possible explanations compared to the number of experiments. We propose a new two-stage “screen and clean” procedure for assessing the uncertainties pertaining to the selection of relevant variables in high-dimensional regression problems. In this two-stage method, screening consists in selecting a subset of candidate variables by a sparsity-inducing penalized regression, while cleaning consists in discarding all variables that do not pass a significance test. This test was originally based on ordinary least squares regression. We propose to improve the procedure by conveying more information from the screening stage to the cleaning stage. Our cleaning stage is based on an adaptively penalized regression whose weights are adjusted in the screening stage. Our procedure is amenable to the computation of p-values, allowing to control the False Discovery Rate. Our experiments show the benefits of our procedure, as we observe a systematic improvement of sensitivity compared to the original procedure.
Document type :
Conference papers
Complete list of metadatas
Contributor : Yves Grandvalet <>
Submitted on : Monday, May 9, 2016 - 5:54:57 PM
Last modification on : Friday, July 20, 2018 - 11:13:37 AM



Jean-Michel Bécu, Yves Grandvalet, Christophe Ambroise, Cyril Dalmasso. Significance testing for variable selection in high-dimension. Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Aug 2015, Niagara Falls, Canada. pp.1-8, ⟨10.1109/CIBCB.2015.7300313⟩. ⟨hal-01313310⟩



Record views