Finding Friends and Followers in Sub-linear Time - Archive ouverte HAL Accéder directement au contenu
Rapport (Rapport De Recherche) Année : 2009

Finding Friends and Followers in Sub-linear Time

David Arthur
  • Fonction : Auteur
  • PersonId : 864353
Anneesh Sharma
  • Fonction : Auteur
  • PersonId : 864354

Résumé

The approximate Nearest Neighbor (NN) search problem asks to pre-process a given set of points $P$ in such a way that, given any query point $q$, one can retrieve a point in $P$ that is approximately closest to $q$. Of particular interest is the case of points lying in high dimensions, which has seen rapid developments since the introduction of the Locality-Sensitive Hashing (LSH) data structure by Indyk and Motwani. Combined with a space decomposition by Har-Peled, the LSH data structure can answer approximate NN queries in sub-linear time using polynomial (in both $d$ and $n$) space. Unfortunately, it is not known whether Har-Peled's space decomposition can be maintained efficiently under point insertions and deletions, so the above solution only works in a static setting. In this paper we present a variant of Har-Peled's decomposition, based on random semi-regular grids, which can achieve the same query time with the added advantage that it can be maintained efficiently even under adversarial point insertions and deletions. The outcome is a new data structure to answer approximate NN queries efficiently in dynamic settings. Another related problem known as Reverse Nearest Neighbor (RNN) search is to find the influence set of a given query point $q$, i.e. the subset of points of $P$ that have $q$ as their nearest neighbor. Although this problem finds many practical applications, very little is known about its complexity. In particular, no algorithm is known to solve it in high dimensions in sub-linear time using sub-exponential space. In this paper we show how to pre-process the data points, so that Har-Peled's space decomposition combined with modified LSH data structures can solve an approximate variant of the RNN problem efficiently, using polynomial space. The query time of our approach is bounded by two terms: the first one is sub-linear in the size of $P$ and corresponds roughly to the incompressible time needed to locate the query point in the data structure; the second one is proportional to the size of the output, which is a set of points as opposed to a single point for (approximate) NN queries. An interesting feature of our RNN solution is that it is flexible enough to be applied indifferently in monochromatic or bichromatic settings.
Fichier principal
Vignette du fichier
RR-7084.pdf (385.35 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

inria-00429459 , version 1 (06-11-2009)
inria-00429459 , version 2 (09-11-2009)
inria-00429459 , version 3 (31-10-2010)
inria-00429459 , version 4 (08-11-2010)
inria-00429459 , version 5 (22-11-2010)

Identifiants

  • HAL Id : inria-00429459 , version 1

Citer

David Arthur, Steve Y. Oudot, Anneesh Sharma. Finding Friends and Followers in Sub-linear Time. [Research Report] RR-7084, 2009. ⟨inria-00429459v1⟩
497 Consultations
579 Téléchargements

Partager

Gmail Facebook X LinkedIn More