Approximating Spectral Clustering via Sampling: a Review - Archive ouverte HAL Accéder directement au contenu
Chapitre D'ouvrage Année : 2020

Approximating Spectral Clustering via Sampling: a Review

Résumé

Spectral clustering refers to a family of well-known unsupervised learning algorithms. Rather than attempting to cluster points in their native domain, one constructs a (usually sparse) similarity graph and computes the principal eigenvec-tors of its Laplacian. The eigenvectors are then interpreted as transformed points and fed into a k-means clustering algorithm. As a result of this non-linear transformation , it becomes possible to use a simple centroid-based algorithm in order to identify non-convex clusters, something that was otherwise impossible. Unfortunately , what makes spectral clustering so successful is also its Achilles heel: forming a graph and computing its dominant eigenvectors can be computationally prohibitive when dealing with more that a few tens of thousands of points. In this chapter, we review the principal research efforts aiming to reduce this computational cost. We focus on methods that come with a theoretical control on the clustering performance and incorporate some form of sampling in their operation. Such methods abound in the machine learning, numerical linear algebra, and graph signal processing literature and, amongst others, include Nyström-approximation, landmarks, coarsening, coresets, and compressive spectral clustering. We present the approximation guarantees available for each and discuss practical merits and limitations. Surprisingly, despite the breadth of the literature explored, we conclude that there is still a gap between theory and practice: the most scalable methods are only intuitively motivated or loosely controlled, whereas those that come with end-to-end guarantees rely on strong assumptions or enable a limited gain of computation time.
Fichier principal
Vignette du fichier
main_Chapter.pdf (639.09 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02468312 , version 1 (05-02-2020)

Identifiants

Citer

Nicolas Tremblay, Andreas Loukas. Approximating Spectral Clustering via Sampling: a Review. Sampling Techniques for Supervised or Unsupervised Tasks, 2020, 978-3-030-29348-2. ⟨10.1007/978-3-030-29349-9_5⟩. ⟨hal-02468312⟩
105 Consultations
508 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More