Skip to Main content Skip to Navigation
Conference papers

Sparse principal component analysis for multiblock data and its extension to sparse multiple correspondence analysis

Abstract : Two new methods to select groups of variables have been developed for multiblock data: "Group Sparse Principal Component Analysis" (GSPCA) for continuous variables and "Sparse Multiple Correspondence Analysis" (SMCA) for categorical variables. GSPCA is a compromise between Sparse PCA method of Zou, Hastie and Tibshirani and the method "group Lasso" of Yuan and Lin. PCA is formulated as a regression-type optimization problem and uses the constraints of the group Lasso on regression coe cients to produce modi ed principal components with sparse loadings. It leads to reduce the number of nonzero coe cients, i.e. the number of selected groups. SMCA is a straightforward extension of GSPCA to groups of indicator variables, with the chi-square metric. Two real examples will be used to illustrate each method. The fi rst one is a data set on 25 trace elements measured in three tissues of 48 crabs (25 blocks of 3 variables). The second one is a data set of 502 women aimed at the identi cation of genes a ecting skin aging with more than 370.000 blocks, each block corresponding to SNPs (Single Nucleotide Polymorphisms) coded into 3 categories.
Complete list of metadatas

Cited literature [8 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01126171
Contributor : Laboratoire Cedric <>
Submitted on : Tuesday, March 24, 2020 - 10:13:24 AM
Last modification on : Saturday, March 28, 2020 - 7:10:56 PM

File

art_2625.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-01126171, version 1

Collections

Citation

Anne Bernard, Christiane Guinot, Gilbert Saporta. Sparse principal component analysis for multiblock data and its extension to sparse multiple correspondence analysis. Compstat 2012, Aug 2012, Limassol, Cyprus. pp.99-106. ⟨hal-01126171⟩

Share

Metrics

Record views

81

Files downloads

5