Kernel spectral clustering of large dimensional data - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Electronic Journal of Statistics Année : 2016

Kernel spectral clustering of large dimensional data

Romain Couillet
Florent Benaych-Georges
  • Fonction : Auteur
  • PersonId : 849874

Résumé

This article proposes a first analysis of kernel spectral clustering methods in the regime where the dimension p of the data vectors to be clustered and their number n grow large at the same rate. We demonstrate, under a k-class Gaussian mixture model, that the normalized Laplacian matrix associated with the kernel matrix asymptotically behaves similar to a so-called spiked random matrix. Some of the isolated eigenvalue-eigenvector pairs in this model are shown to carry the clustering information upon a separability condition classical in spiked matrix models. We evaluate precisely the position of these eigenvalues and the content of the eigenvectors, which unveil important (sometimes quite disruptive) aspects of kernel spectral clustering both from a theoretical and practical standpoints. Our results are then compared to the actual clustering performance of images from the MNIST database, thereby revealing an important match between theory and practice.
Fichier principal
Vignette du fichier
couillet_spectral_clustering.pdf (787.09 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01215343 , version 1 (19-05-2020)

Identifiants

Citer

Romain Couillet, Florent Benaych-Georges. Kernel spectral clustering of large dimensional data. Electronic Journal of Statistics , 2016, 10 (1), pp.1393-1454. ⟨10.1214/16-EJS1144⟩. ⟨hal-01215343⟩
213 Consultations
178 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More