A Fast Word Retrieval Technique Based on Kernelized Locality Sensitive Hashing

Abstract : In this paper, we have presented a new and faster word retrieval approach, which is able to deal with heterogeneous document image collections. A certain amount of image features (statistical and Gabor Wavelet) are extracted, which inherently represent word's images. These features are used for generating hash table for fast retrieval of similar image from a very large image dataset. The decomposition and embedding of high-dimensional features and complex distance functions into a low-dimensional Hamming space helps to efficiently search items. However, existing methods do not apply for high-dimensional kernelized data when the underlying features' embedding for the kernel is unknown. The generalization of locality sensitive hashing (LSH) for arbitrary kernel is presented in the paper. The proposed algorithm provides sub-linear time similarity search and works for a wide class of similarity functions.
Complete list of metadatas

Contributor : Denis Maurel <>
Submitted on : Tuesday, July 22, 2014 - 9:30:07 AM
Last modification on : Saturday, October 26, 2019 - 2:05:56 AM


  • HAL Id : hal-01027461, version 1


Tanmoy Mondal, Nicolas Ragot, Jean-Yves Ramel, Umapada Pal. A Fast Word Retrieval Technique Based on Kernelized Locality Sensitive Hashing. 12th International Conference on Document Analysis and Recognition, Aug 2013, Washington DC, United States. pp.1195-1199. ⟨hal-01027461⟩



Record views