Understanding Big Data Spectral Clustering

Abstract : This article introduces an original approach to understand the behavior of standard kernel spectral clustering algorithms (such as the Ng–Jordan–Weiss method) for large dimensional datasets. Precisely, using advanced methods from the field of random matrix theory and assuming Gaussian data vectors, we show that the Laplacian of the kernel matrix can asymptotically be well approximated by an analytically tractable equivalent random matrix. The study of the latter unveils the mechanisms into play and in particular the impact of the choice of the kernel function and some theoretical limits of the method. Despite our Gaussian assumption, we also observe that the predicted theoretical behavior is a close match to that experienced on real datasets (taken from the MNIST database).
Type de document :
Communication dans un congrès
2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), Dec 2015, Cancun, Mexico. 2015
Liste complète des métadonnées


https://hal.archives-ouvertes.fr/hal-01205208
Contributeur : Florent Benaych-Georges <>
Soumis le : vendredi 25 septembre 2015 - 10:11:23
Dernière modification le : mardi 11 octobre 2016 - 14:55:26
Document(s) archivé(s) le : mardi 29 décembre 2015 - 09:58:23

Fichier

spectral_clustering_camsap.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01205208, version 1

Collections

Citation

Romain Couillet, Florent Benaych-Georges. Understanding Big Data Spectral Clustering. 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), Dec 2015, Cancun, Mexico. 2015. <hal-01205208>

Partager

Métriques

Consultations de
la notice

945

Téléchargements du document

107