Significance testing for variable selection in high-dimension

Assessing the uncertainty pertaining to the conclusions derived from experimental data is challenging when there is a high number of possible explanations compared to the number of experiments. We propose a new two-stage “screen and clean” procedure for assessing the uncertainties pertaining to the selection of relevant variables in high-dimensional regression problems. In this two-stage method, screening consists in selecting a subset of candidate variables by a sparsity-inducing penalized regression, while cleaning consists in discarding all variables that do not pass a significance test. This test was originally based on ordinary least squares regression. We propose to improve the procedure by conveying more information from the screening stage to the cleaning stage. Our cleaning stage is based on an adaptively penalized regression whose weights are adjusted in the screening stage. Our procedure is amenable to the computation of p-values, allowing to control the False Discovery Rate. Our experiments show the benefits of our procedure, as we observe a systematic improvement of sensitivity compared to the original procedure.

Domaines

Statistiques [stat] Bio-informatique [q-bio.QM]

Yves Grandvalet : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01313310

Soumis le : lundi 9 mai 2016-17:54:57

Dernière modification le : mercredi 13 mars 2024-15:42:03

Dates et versions

hal-01313310 , version 1 (09-05-2016)

Identifiants

HAL Id : hal-01313310 , version 1
DOI : 10.1109/CIBCB.2015.7300313

Citer

Jean-Michel Bécu, Yves Grandvalet, Christophe Ambroise, Cyril Dalmasso. Significance testing for variable selection in high-dimension. Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Aug 2015, Niagara Falls, Canada. pp.1-8, ⟨10.1109/CIBCB.2015.7300313⟩. ⟨hal-01313310⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS UNIV-COMPIEGNE UNIV-EVRY INRA HEUDIASYC DI LAMME UNIV-PARIS-SACLAY INRAE ANR GS-ENGINEERING MATHNUM

129 Consultations

0 Téléchargements