| HAL : hal-00455561, version 1 |
| Fiche détaillée | Récupérer au format |
|
|
| Journal de la SFdS 150, 2 (2009) 28-51 |
|
|
|
|
| Gestion des données manquantes en Analyse en Composantes Principales |
|
|
| Julie Josse 1, 2François Husson 2 |
|
|
| (2009) |
|
|
| An approach commonly used to handle missing values in Principal Component Analysis (PCA) consists in ignoring the missing values by optimizing the loss function over all non-missing elements. This can be achieved by several methods, including the use of NIPALS, weighted regression or iterative PCA. The latter is based on iterative imputation of the missing elements during the estimation of the parameters, and can be seen as a particular EM algorithm. First, we review theses approaches with respect to the criterion minimization. This presentation gives a good understanding of their properties and the difficulties encountered. Then, we point out the problem of overfitting and we show how the probabilistic formulation of PCA (Tipping & Bishop, 1997) offers a proper and convenient regularization term to overcome this problem. Finally, the performances of the new algorithm are compared to those of the other algorithms from simulations. |
|
|
|
|
|
|
|
|
|
|
| 1 : | Laboratoire de Mathématiques Appliquées - Agrocampus Rennes |
| Institut supérieur des sciences agronomiques, agroalimentaires, horticoles et du paysage | |
| 2 : | Institut de Recherche Mathématique de Rennes (IRMAR) |
| CNRS : UMR6625 – Université de Rennes 1 – École normale supérieure de Cachan - ENS Cachan – Institut National des Sciences Appliquées (INSA) : - RENNES – Université de Rennes II - Haute Bretagne | |
| 3 : | Agrocampus Ouest |
| Institut supérieur des sciences agronomiques, agroalimentaires, horticoles et du paysage – Ministère de l'agriculture, de l'agroalimentaire et de la forêt | |
|
|
|
|
|
|
|
|
| Statistique |
|
|
|
|
| Domaine | : | Mathématiques/Statistiques Statistiques/Théorie |
|
|
| PCA – missing values – alternating weighted least squares – EM algorithm – GEM-PCA – overfitting – probabilistic PCA |
| hal-00455561, version 1 | |
| http://hal.archives-ouvertes.fr/hal-00455561 | |
| oai:hal.archives-ouvertes.fr:hal-00455561 | |
| Contributeur : Maryse Collin | |
| Soumis le : Mercredi 10 Février 2010, 16:24:56 | |
| Dernière modification le : Mercredi 19 Juin 2013, 16:53:03 | |