Online Principal Component Analysis in High Dimension: Which Algorithm to Choose?

Abstract : Principal component analysis (PCA) is a method of choice for dimension reduction. In the current context of data explosion, online techniques that do not require storing all data in memory are indispensable to perform the PCA of streaming data and/or massive data. Despite the wide availability of recursive algorithms that can efficiently update the PCA when new data are observed, the literature offers little guidance on how to select a suitable algorithm for a given application. This paper reviews the main approaches to online PCA, namely, perturbation techniques, incremental methods and stochastic optimisation, and compares the most widely employed techniques in terms statistical accuracy, computation time and memory requirements using artificial and real data. Extensions of online PCA to missing data and to functional data are detailed. All studied algorithms are available in the package onlinePCA on CRAN.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01700948
Contributor : Sébastien Mazzarese <>
Submitted on : Monday, February 5, 2018 - 2:15:19 PM
Last modification on : Friday, October 19, 2018 - 10:48:48 AM

Links full text

Identifiers

Collections

Citation

Hervé Cardot, David Degras. Online Principal Component Analysis in High Dimension: Which Algorithm to Choose?. International Statistical Review, Wiley, 2018, 86 (1), pp.29-50. ⟨10.1111/insr.12220⟩. ⟨hal-01700948⟩

Share

Metrics

Record views

84