A scalable pattern spotting system for historical documents - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Pattern Recognition Année : 2016

A scalable pattern spotting system for historical documents

Résumé

Information retrieval in historical documents has long consisted in spotting words. In this paper, we focus on graphical pattern spotting. Contrary to object detection and classification, where models of the object of interest may be trained, pattern spotting does not rely on any prior information regarding the query, nor predefined class of graphical objects. An offline sliding window approach may be suitable, provided that the challenge raised by high computational and storage costs is handled. We propose an unsupervised, segmentation-free approach that takes advantage of recent developments in computer vision to overcome these issues. We also investigate the use of new, compact descriptors for the data, namely the vectors of locally aggregated descriptors (VLAD) and Fisher Vectors, instead of the usual bag-of-visual-words approach. Results obtained on medieval manuscripts from the DocExplore project show that our approach achieves better retrieval results, with a better efficiency in terms of time/memory, compared to standard approaches. Experimentations show that VLAD and Fisher Vectors can be fruitfully used in the future for the description of historical documents. Additionally, we show that our system can be easily turned into a word spotting system with slight adaptation, and that it achieves results comparable to those recently published in ICDAR 2015 keyword spotting challenge.
Fichier non déposé

Dates et versions

hal-02116648 , version 1 (01-05-2019)

Identifiants

Citer

Sovann En, Caroline Petitjean, Stephane Nicolas, Laurent Heutte. A scalable pattern spotting system for historical documents. Pattern Recognition, 2016, 54, pp.149-161. ⟨10.1016/j.patcog.2016.01.014⟩. ⟨hal-02116648⟩
82 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More