Bayesian mixture models (in)consistency for the number of clusters - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail (Preprint/Prepublication) Année : 2022

Bayesian mixture models (in)consistency for the number of clusters

Résumé

Bayesian nonparametric mixture models are common for modeling complex data. While these models are well-suited for density estimation, their application for clustering has some limitations. Miller and Harrison (2014) proved posterior inconsistency in the number of clusters when the true number of clusters is finite for Dirichlet process and Pitman-Yor process mixture models. In this work, we extend this result to additional Bayesian nonparametric priors such as Gibbs-type processes and finitedimensional representations of them. The latter include the Dirichlet multinomial process and the recently proposed Pitman-Yor and normalized generalized gamma multinomial processes. We show that mixture models based on these processes are also inconsistent in the number of clusters and discuss possible solutions. Notably, we show that a post-processing algorithm introduced by Guha et al. (2021) for the Dirichlet process extends to more general models and provides a consistent method to estimate the number of components.
Fichier principal
Vignette du fichier
2210.14201.pdf (670.32 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03866434 , version 1 (22-11-2022)
hal-03866434 , version 2 (22-02-2023)

Identifiants

Citer

Louise Alamichel, Daria Bystrova, Julyan Arbel, Guillaume Kon Kam King. Bayesian mixture models (in)consistency for the number of clusters. 2022. ⟨hal-03866434v1⟩
78 Consultations
140 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More