# On the sign recovery by LASSO, thresholded LASSO and thresholded Basis Pursuit Denoising

Abstract : In the high-dimensional regression model Y = Xβ + ε, we provide new theoretical results on the probability of recovering the sign of β by the Least Absolute Selection and Shrinkage Operator (LASSO) and by the thresholded LASSO. It is well known that "irrepresentability" is a necessary condition for LASSO to recover the sign of β with a large probability. In this article we extend this result by providing a tight upper bound for the probability of LASSO sign recovery. This upper bound is smaller than 1/2 when the irrepresentable condition does not hold and thus generalizes Theorem 2 of Wainwright [27]. The bound depends on the tuning parameter λ and is attained when non-null components of β tend to infinity; its value is equal to the limit of the probability that every null component of β is correctly estimated at 0. Consequently, this bound makes it possible to select λ so as to control the probability of at least one false discovery. The "irrepresentability" is a stringent necessary condition to recover the sign of β by LASSO which can be substantially relaxed when LASSO estimates are additionally filtered out with an appropriately selected threshold. In this article we provide new theoretical results on thresholded LASSO and thresholded Basis Pursuit DeNoising (BPDN) in the asymptotic setup under which X is fixed and non-null components of β tend to infinity. Compared to the classical asymptotics, where X is a n × p matrix and both n and p tend to +∞, our approach allows for reduction of the technical burden. Our main Theorem takes a simple form: When non-null components of β are sufficiently large, appropriately thresholded LASSO or thresholded BPDN can recover the sign of β if and only if β is identifiable with respect to the L1 norm, i.e. If Xγ = Xβ and γ = β then γ 1 > β 1. To illustrate our results we present examples of irrepresentability and identifiability curves for some selected design matrices X. These curves provide the proportion of k sparse vectors β for which the irrep-* Corresponding author: tardivel@math.uni.wroc.pl 1 resentability and identifiability conditions hold. Our examples illustrate that "irrepresentability" is a much stronger condition than "identifiability", especially when the entries in each row of X are strongly correlated. Finally, we illustrate how the knockoff methodology [1, 8] can be used to select an appropriate threshold and that thresholded BPDN and LASSO can recover the sign of β with a larger probability than adaptive LASSO [32].
Keywords :
Document type :
Preprints, Working Papers, ...
Domain :

Cited literature [32 references]

https://hal.archives-ouvertes.fr/hal-01956603
Contributor : Patrick Tardivel <>
Submitted on : Wednesday, June 26, 2019 - 10:41:40 AM
Last modification on : Saturday, November 21, 2020 - 9:54:03 AM

### File

thresholded_LASSO_HAL_V4.pdf
Files produced by the author(s)

### Identifiers

• HAL Id : hal-01956603, version 4

### Citation

Patrick Tardivel, Malgorzata Bogdan. On the sign recovery by LASSO, thresholded LASSO and thresholded Basis Pursuit Denoising. 2019. ⟨hal-01956603v4⟩

Record views