On estimation of the noise variance in high-dimensional probabilistic principal component analysis
Résumé
In this paper, we develop new statistical theory for probabilistic principal component analysis models in high dimensions. The focus is the estimation of the noise variance, which is an important and unresolved issue when the number of variables is large in comparison with the sample size. We first unveil the reasons of a widely observed downward bias of the maximum likelihood estimator of the variance when the data dimension is high. We then propose a bias-corrected estimator using random matrix theory and establish its asymptotic normality. The superiority of the new (bias-corrected) estimator over existing alternatives is first checked by Monte-Carlo experiments with various combinations of $(p, n)$ (dimension and sample size). In order to demonstrate further potential benefits from the results of the paper to general probability PCA analysis, we provide evidence of net improvements in two popular procedures (Ulfarsson and Solo, 2008; Bai and Ng, 2002) for determining the number of principal components when the respective variance estimator proposed by these authors is replaced by the bias-corrected estimator. The new estimator is also used to derive new asymptotics for the related goodness-of-fit statistic under the high-dimensional scheme.
Origine : Fichiers produits par l'(les) auteur(s)
Loading...