On the sign recovery given by the thresholded LASSO and thresholded Basis Pursuit - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2019

On the sign recovery given by the thresholded LASSO and thresholded Basis Pursuit

Patrick J C Tardivel
Maa Lgorzata Bogdan
  • Fonction : Auteur

Résumé

We consider the regression model Y = Xβ * + ε, when the number of observations n is smaller than the number of explicative variables p. It is well known that the popular Least Absolute Shrinkage and Selection Operator (LASSO) can recover the sign of β * only if a very stringent irrepresentable condition is satisfied. In this article, in a first step, we provide a new result about the irrepresentable condition: the probability to recover the sign of β * with the LASSO is smaller than 1/2 once the irrepresentable condition does not hold. On the other hand, LASSO can consistently estimate β * under much weaker assumptions than the irrepresentable condition. This implies that appropriately thresholded LASSO can recover the sign of β * under such weaker assumptions (see e.g. [24] or [34]). In this article we revisit properties of thresholded LASSO and provide new theoretical results in the asymptotic setup under which the design matrix is fixed and the magnitudes of nonzero components of β * tends to infinity. Apart from LASSO, our results cover also basis pursuit, which can be thought of as a limiting case of LASSO when the tuning parameter tends to 0. Compared to the classical asymptotics with respect to n and p, our approach allows for reduction of the technical burden. In the result our main theorem takes a simple form: Appropriately thresholded LASSO (with any given value of the tuning parameter) or thresh-olded basis pursuit can recover the sign of the sufficiently large signal if and only if β * is identifiable with respect to the l 1 norm, i.e. If Xγ = Xβ * and γ = β * then γ1 > β * 1, or in another words, when β * can be recovered by solving the basis pursuit problem in the noiseless case. For any given design matrix X, we define the irrepresentability and identifiability curves. For a given integer r, these curves provide the proportion of β * having r nonzeros for which respectively the irrepre-sentability and identifiability conditions hold. These curves illustrate that the irrepresentable condition is * Corresponding author: tardivel@math.uni.wroc.pl 1 much stronger than the identifiability condition (thus highlight our theoretical results) since the gap between the irrepresentability and identifiability curves is very large. One notices that the identifiability curves drops very quickly from 1 to 0. These numerical observations are not surprising when X has i.i.d N (0, 1) entries. Indeed, when n and p are both large there exists a value ktr ∈ (0, 1) (given by the asymptotic transition curve [14]) such that the proportion of β * identifiable with respect to the l 1 norm is close to 1 (resp. close to 0) as soon as r/n < ktr (resp. r/n > ktr). Surprisingly, contrarily to classical assumptions (such as the irrepresentability), the identifiability condition does not become a very stringent condition when entries of X are extremely correlated. Indeed, the identifiability curve is the same when entries of X are extremely correlated as when entries of X has i.i.d N (0, 1) entries. In addition, when the entries of X are positively correlated and the components of β * have the same sign, the identifiability curve is highly above the one associated to i.i.d N (0, 1) entries. Finally, we illustrate how the knockoff methodology [2, 9] can be used to select the appropriate threshold and that thresholded basis pursuit and LASSO can recover the sign of β * with a larger probability than adaptive LASSO [38].
Fichier principal
Vignette du fichier
active_set_version8.pdf (541.35 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01956603 , version 1 (16-12-2018)
hal-01956603 , version 2 (30-03-2019)
hal-01956603 , version 3 (22-04-2019)
hal-01956603 , version 4 (26-06-2019)
hal-01956603 , version 5 (08-06-2020)
hal-01956603 , version 6 (09-05-2021)
hal-01956603 , version 7 (31-08-2021)

Identifiants

  • HAL Id : hal-01956603 , version 2

Citer

Patrick J C Tardivel, Maa Lgorzata Bogdan. On the sign recovery given by the thresholded LASSO and thresholded Basis Pursuit. 2019. ⟨hal-01956603v2⟩

Collections

INRA
317 Consultations
841 Téléchargements

Partager

Gmail Facebook X LinkedIn More