User-driven Page Layout Analysis of historical printed Books

Abstract : In this paper, based on the study of the specificity of historical printed books, we first explain the main error sources in classical methods used for page layout analysis. We show that each method (bottom-up and top-down) provides different types of useful information that should not be ignored, if we want to obtain both a generic method and good segmentation results. Next, we propose to use a hybrid segmentation algorithm that builds two maps: a shape map that focuses on connected components and a background map, which provides information about white areas corresponding to block separations in the page. Using this first segmentation, a classification of the extracted blocks can be achieved according to scenarios produced by the user. These scenarios are defined very simply during an interactive stage. The user is able to make processing sequences adapted to the different kinds of images he is likely to meet and according to the user needs. The proposed “user-driven approach” is capable of doing segmentation and labelling of the required user high level concepts efficiently and has achieved above 93% accurate results over different data sets tested. User feedbacks and experimental results demonstrate the effectiveness and usability of our framework mainly because the extraction rules can be defined without difficulty and parameters are not sensitive to page layout variation.
Complete list of metadatas

Cited literature [32 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00150167
Contributor : Jean-Yves Ramel <>
Submitted on : Wednesday, June 2, 2010 - 8:37:51 AM
Last modification on : Wednesday, November 6, 2019 - 2:37:38 PM
Long-term archiving on: Friday, September 17, 2010 - 10:41:11 AM

File

ramel_v40.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Jean-Yves Ramel, Marie-Luce Demonet, Sébastien Busson. User-driven Page Layout Analysis of historical printed Books. International Journal on Document Analysis and Recognition, Springer Verlag, 2007, 9 (2-4), pp.243-261. ⟨10.1007/s10032-007-0040-6⟩. ⟨hal-00150167⟩

Share

Metrics

Record views

451

Files downloads

708