Skip to Main content Skip to Navigation
Conference papers

Taming the curse of dimensionality for perturbed token identification

Abstract : In the context of data tokenization, we model a token as a vector of a finite dimensional metric space E and given a finite subset of E, called the token set, we address the problem of deciding whether a given token is in a small neighborhood of an other token. We derive conditions to characterize the nearest token of a given one and show that these conditions are fulfilled asymptotically as the dimension of E tends to infinity. Whereas the classical nearest neighbor search is inefficient to solve such problem, we propose a new probabilistic algorithm, which becomes efficient if the dimension of E is large enough.
Complete list of metadata

Cited literature [6 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02865114
Contributor : Jérémy Rouot Connect in order to contact the contributor
Submitted on : Tuesday, September 1, 2020 - 11:42:09 AM
Last modification on : Wednesday, July 7, 2021 - 9:28:03 AM

File

token2020.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02865114, version 2

Citation

Olga Assainova, Jérémy Rouot, Ehsan Sedgh-Gooya. Taming the curse of dimensionality for perturbed token identification. 10th International Conference on Image Processing Theory, Tools and Applications, Nov 2020, Paris, France. ⟨hal-02865114v2⟩

Share

Metrics

Les métriques sont temporairement indisponibles