Taming the curse of dimensionality for perturbed token identification
Résumé
In the context of data tokenization, we model a token as a vector of a finite dimensional metric space E and given a finite subset of E, called the token set, we address the problem of deciding whether a given token is in a small neighborhood of an other token. We derive conditions to characterize the
nearest token of a given one and show that these conditions are fulfilled asymptotically as the dimension of E tends to infinity. Whereas the classical nearest neighbor search is inefficient to solve such problem, we propose a new probabilistic algorithm, which becomes efficient if the dimension of E is large enough.
Origine : Fichiers produits par l'(les) auteur(s)
Loading...