Understanding Big Data Spectral Clustering - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

Understanding Big Data Spectral Clustering

Romain Couillet
Florent Benaych-Georges
  • Fonction : Auteur
  • PersonId : 849874

Résumé

This article introduces an original approach to understand the behavior of standard kernel spectral clustering algorithms (such as the Ng–Jordan–Weiss method) for large dimensional datasets. Precisely, using advanced methods from the field of random matrix theory and assuming Gaussian data vectors, we show that the Laplacian of the kernel matrix can asymptotically be well approximated by an analytically tractable equivalent random matrix. The study of the latter unveils the mechanisms into play and in particular the impact of the choice of the kernel function and some theoretical limits of the method. Despite our Gaussian assumption, we also observe that the predicted theoretical behavior is a close match to that experienced on real datasets (taken from the MNIST database).
Fichier principal
Vignette du fichier
spectral_clustering_camsap.pdf (286.42 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01205208 , version 1 (25-09-2015)

Identifiants

  • HAL Id : hal-01205208 , version 1

Citer

Romain Couillet, Florent Benaych-Georges. Understanding Big Data Spectral Clustering. 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), Dec 2015, Cancun, Mexico. ⟨hal-01205208⟩
196 Consultations
182 Téléchargements

Partager

Gmail Facebook X LinkedIn More