3279 articles – 5422 references  [version française]
HAL: hal-00704514, version 1

Detailed view  Export this paper
Advances in Data Analysis and Classification 5, 3 (2011) 231-246
Multiple imputation in principal component analysis
Julie Josse 1, 2, Jérome Pagès 1, 2, François Husson 1, 2
(2011)

The available methods to handle missing values in principal component analysis only provide point estimates of the parameters (axes and components) and estimates of the missing values. To take into account the variability due to missing values a multiple imputation method is proposed. First a method to generate multiple imputed data sets from a principal component analysis model is defined. Then, two ways to visualize the uncertainty due to missing values onto the principal component analysis results are described. The first one consists in projecting the imputed data sets onto a reference configuration as supplementary elements to assess the stability of the individuals (respectively of the variables). The second one consists in performing a principal component analysis on each imputed data set and fitting each obtained configuration onto the reference one with Procrustes rotation. The latter strategy allows to assess the variability of the principal component analysis parameters induced by the missing values. The methodology is then evaluated from a real data set.
1:  Institut de Recherche Mathématique de Rennes (IRMAR)
CNRS : UMR6625 – Université de Rennes 1 – École normale supérieure de Cachan - ENS Cachan – Institut National des Sciences Appliquées (INSA) : - RENNES – Université de Rennes II - Haute Bretagne
2:  Agrocampus Ouest
Institut supérieur des sciences agronomiques, agroalimentaires, horticoles et du paysage – Ministère de l'agriculture, de l'agroalimentaire et de la forêt
Statistique
Mathematics/Statistics

Statistics/Statistics Theory
Principal component analysis – Missing values – EM algorithm – Multiple imputation – Bootstrap – Procrustes rotation