Online Principal Component Analysis in High Dimension: Which Algorithm to Choose? - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue International Statistical Review Année : 2018

Online Principal Component Analysis in High Dimension: Which Algorithm to Choose?

Résumé

Principal component analysis (PCA) is a method of choice for dimension reduction. In the current context of data explosion, online techniques that do not require storing all data in memory are indispensable to perform the PCA of streaming data and/or massive data. Despite the wide availability of recursive algorithms that can efficiently update the PCA when new data are observed, the literature offers little guidance on how to select a suitable algorithm for a given application. This paper reviews the main approaches to online PCA, namely, perturbation techniques, incremental methods and stochastic optimisation, and compares the most widely employed techniques in terms statistical accuracy, computation time and memory requirements using artificial and real data. Extensions of online PCA to missing data and to functional data are detailed. All studied algorithms are available in the package onlinePCA on CRAN.
Fichier principal
Vignette du fichier
1511.03688.pdf (882.75 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte

Dates et versions

hal-01700948 , version 1 (30-01-2024)

Identifiants

Citer

Hervé Cardot, David Degras. Online Principal Component Analysis in High Dimension: Which Algorithm to Choose?. International Statistical Review, 2018, 86 (1), pp.29-50. ⟨10.1111/insr.12220⟩. ⟨hal-01700948⟩
104 Consultations
2 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More