Model-based Clustering of High-dimensional Data Streams with Online Mixture of Probabilistic PCA

Abstract : Model-based clustering is a popular tool which is renowned for its probabilistic foundations and its flexibility. However, model-based clustering techniques usually perform poorly when dealing with high-dimensional data streams, which are nowadays a frequent data type. To overcome this limitation of model-based clustering, we propose an online inference algorithm for the mixture of probabilistic PCA model. The proposed algorithm relies on an EM-based procedure and on a probabilistic and incremental version of PCA. Model selection is also considered in the online setting through parallel computing. Numerical experiments on simulated and real data demonstrate the effectiveness of our approach and compare it to sate-of-the-art online EM-based algorithms.
Type de document :
Article dans une revue
Advances in Data Analysis and Classification, Springer Verlag, 2013, 7, pp.281-300
Liste complète des métadonnées

Littérature citée [39 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-00759945
Contributeur : Charles Bouveyron <>
Soumis le : lundi 3 décembre 2012 - 11:09:52
Dernière modification le : dimanche 8 février 2015 - 01:01:24
Document(s) archivé(s) le : lundi 4 mars 2013 - 03:46:04

Fichier

article_MPPCA.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00759945, version 1

Citation

Anastasios Bellas, Charles Bouveyron, Marie Cottrell, Jérôme Lacaille. Model-based Clustering of High-dimensional Data Streams with Online Mixture of Probabilistic PCA. Advances in Data Analysis and Classification, Springer Verlag, 2013, 7, pp.281-300. 〈hal-00759945〉

Partager

Métriques

Consultations de
la notice

406

Téléchargements du document

496