Understanding Big Data Spectral Clustering

Abstract : This article introduces an original approach to understand the behavior of standard kernel spectral clustering algorithms (such as the Ng–Jordan–Weiss method) for large dimensional datasets. Precisely, using advanced methods from the field of random matrix theory and assuming Gaussian data vectors, we show that the Laplacian of the kernel matrix can asymptotically be well approximated by an analytically tractable equivalent random matrix. The analysis of the former allows one to understand deeply the mechanism into play and in particular the impact of the choice of the kernel function and some theoretical limits of the method. Despite our Gaussian assumption, we also observe that the predicted theoretical behavior is a close match to that experienced on real datasets (taken from the MNIST database).
Type de document :
Communication dans un congrès
IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), Dec 2015, Cancun, Mexico
Liste complète des métadonnées


https://hal.archives-ouvertes.fr/hal-01242494
Contributeur : Matha Deghel <>
Soumis le : samedi 12 décembre 2015 - 18:20:39
Dernière modification le : vendredi 17 février 2017 - 16:10:37
Document(s) archivé(s) le : samedi 29 avril 2017 - 12:13:28

Fichier

camsap_spectralclustering.pdf
Accord explicite pour ce dépôt

Identifiants

  • HAL Id : hal-01242494, version 1

Citation

Romain Couillet, Florent Benaych-Georges. Understanding Big Data Spectral Clustering. IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), Dec 2015, Cancun, Mexico. <hal-01242494>

Partager

Métriques

Consultations de
la notice

647

Téléchargements du document

64