Model-based Clustering of High-dimensional Data Streams with Online Mixture of Probabilistic PCA

Abstract : Model-based clustering is a popular tool which is renowned for its probabilistic foundations and its flexibility. However, model-based clustering techniques usually perform poorly when dealing with high-dimensional data streams, which are nowadays a frequent data type. To overcome this limitation of model-based clustering, we propose an online inference algorithm for the mixture of probabilistic PCA model. The proposed algorithm relies on an EM-based procedure and on a probabilistic and incremental version of PCA. Model selection is also considered in the online setting through parallel computing. Numerical experiments on simulated and real data demonstrate the effectiveness of our approach and compare it to sate-of-the-art online EM-based algorithms.
Complete list of metadatas

Cited literature [39 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00759945
Contributor : Charles Bouveyron <>
Submitted on : Monday, December 3, 2012 - 11:09:52 AM
Last modification on : Sunday, January 19, 2020 - 6:38:32 PM
Long-term archiving on: Monday, March 4, 2013 - 3:46:04 AM

File

article_MPPCA.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00759945, version 1

Citation

Anastasios Bellas, Charles Bouveyron, Marie Cottrell, Jérôme Lacaille. Model-based Clustering of High-dimensional Data Streams with Online Mixture of Probabilistic PCA. Advances in Data Analysis and Classification, Springer Verlag, 2013, 7, pp.281-300. ⟨hal-00759945⟩

Share

Metrics

Record views

662

Files downloads

947