Subquadratic High-Dimensional Hierarchical Clustering - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2019

Subquadratic High-Dimensional Hierarchical Clustering

Amir Abboud
  • Fonction : Auteur
  • PersonId : 1057027
Hussein Houdrougé
  • Fonction : Auteur
  • PersonId : 1058115

Résumé

We consider the widely-used average-linkage, single-linkage, and Ward's methods for computing hierarchical clusterings of high-dimensional Euclidean inputs. It is easy to show that there is no efficient implementation of these algorithms in high dimensional Euclidean space since it implicitly requires to solve the closest pair problem, a notoriously difficult problem. However, how fast can these algorithms be implemented if we allow approxima-tion? More precisely: these algorithms successively merge the clusters that are at closest average (for average-linkage), minimum distance (for single-linkage), or inducing the least sum-of-square error (for Ward's). We ask whether one could obtain a significant running-time improvement if the algorithm can merge γ-approximate closest clusters (namely, clusters that are at distance (average, minimum , or sum-of-square error) at most γ times the distance of the closest clusters). We show that one can indeed take advantage of the relaxation and compute the approximate hierarchical clustering tree using r Opnq γ-approximate nearest neighbor queries. This leads to an algorithm running in time r Opndq`n 1`Op1{γq for d-dimensional Euclidean space. We then provide experiments showing that these algorithms perform as well as the non-approximate version for classic classification tasks while achieving a significant speed-up.
Fichier principal
Vignette du fichier
main.pdf (293.81 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02360775 , version 1 (13-11-2019)

Identifiants

  • HAL Id : hal-02360775 , version 1

Citer

Amir Abboud, Vincent Cohen-Addad, Hussein Houdrougé. Subquadratic High-Dimensional Hierarchical Clustering. NeurIPS'19 - 33rd Conference on Neural Information Processing Systems, Dec 2019, Vancouver, Canada. ⟨hal-02360775⟩
96 Consultations
120 Téléchargements

Partager

Gmail Facebook X LinkedIn More