Skip to Main content Skip to Navigation
Journal articles

Model-Based Clustering of High-Dimensional Data: A review

Abstract : Model-based clustering is a popular tool which is renowned for its probabilistic foundations and its flexibility. However, high-dimensional data are nowadays more and more frequent and, unfortunately, classical model-based clustering techniques show a disappointing behavior in high-dimensional spaces. This is mainly due to the fact that model-based clustering methods are dramatically over-parametrized in this case. However, high-dimensional spaces have specific characteristics which are useful for clustering and recent techniques exploit those characteristics. After having recalled the bases of model-based clustering, this article will review dimension reduction approaches, regularization-based techniques, parsimonious modeling, subspace clustering methods and clustering methods based on variable selection. Existing softwares for model-based clustering of high-dimensional data will be also reviewed and their practical use will be illustrated on real-world data sets.
Complete list of metadatas

Cited literature [98 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00750909
Contributor : Charles Bouveyron <>
Submitted on : Monday, November 12, 2012 - 4:31:30 PM
Last modification on : Friday, April 10, 2020 - 5:15:01 PM
Document(s) archivé(s) le : Wednesday, February 13, 2013 - 3:46:26 AM

File

hal_ReviewHD.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

Citation

Charles Bouveyron, Camille Brunet. Model-Based Clustering of High-Dimensional Data: A review. Computational Statistics and Data Analysis, Elsevier, 2013, 71, pp.52-78. ⟨10.1016/j.csda.2012.12.008⟩. ⟨hal-00750909⟩

Share

Metrics

Record views

1562

Files downloads

6731