A survey of some sparse methods for high-dimensional data

Abstract : High dimensional data means that the number of variables p if far larger than the number of observations n. This occurs in several fields such as genomic data or chemometrics. This didactic talk starts from a survey of various solutions in linear regression and present afterwards their extensions to unsupervised « sparse » methods for principal components analysis (PCA) and multiple correspondence analysis (MCA). When pn the OLS estimator does not exist for linear regression. Since it is a case of forced multicollinearity, one may use regularized techniques such as ridge regression, principal component regression or PLS regression: these methods provide rather robust estimates through a dimension reduction approach or with explicit (or not) constraints on the regression coefficients. The fact that all the predictors are kept is often considered as a positive point. However if pn it becomes a drawback since a combination of all variables cannot be interpreted. Sparse combinations, ie with a large number of zero coefficients are preferred. Lasso, elastic net, sparse PLS perform simultaneously regularization and variable selection thanks to non quadratic penalties: L1, SCAD etc. We will present variants such as the group-lasso when the variables are structured in blocks. In PCA, the singular value decomposition shows that if we regress principal components onto the input variables, the vector of regression coefficients is equal to the factor loadings. It suffices to adapt sparse regression techniques to get sparse versions of PCA and of PCA with groups of variables. We conclude by a presentation of a sparse version of Multiple Correspondence Analysis.
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01126236
Contributor : Laboratoire Cedric <>
Submitted on : Friday, March 6, 2015 - 11:46:59 AM
Last modification on : Monday, February 10, 2020 - 6:24:26 PM

Identifiers

  • HAL Id : hal-01126236, version 1

Collections

Citation

Gilbert Saporta. A survey of some sparse methods for high-dimensional data. SADA'13, Mar 2013, Cotonou, Benin. ⟨hal-01126236⟩

Share

Metrics

Record views

81