A Fast Word Retrieval Technique Based on Kernelized Locality Sensitive Hashing
Résumé
In this paper, we have presented a new and faster word retrieval approach, which is able to deal with heterogeneous document image collections. A certain amount of image features (statistical and Gabor Wavelet) are extracted, which inherently represent word's images. These features are used for generating hash table for fast retrieval of similar image from a very large image dataset. The decomposition and embedding of high-dimensional features and complex distance functions into a low-dimensional Hamming space helps to efficiently search items. However, existing methods do not apply for high-dimensional kernelized data when the underlying features' embedding for the kernel is unknown. The generalization of locality sensitive hashing (LSH) for arbitrary kernel is presented in the paper. The proposed algorithm provides sub-linear time similarity search and works for a wide class of similarity functions.