A hierarchical and scalable model for contemporary document image segmentation

Asma Ouji 1 Yann Leydier 1 Frank Le Bourgeois 1
1 imagine - Extraction de Caractéristiques et Identification
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
Abstract : In this paper, we introduce a novel color segmentation approach robust against digitization noise and adapted to contemporary document images. This system is scalable, hierarchical, versatile and completely automated, i.e. user independent. It proposes an adaptive binarization / quantization without any penalizing information loss.This model may be used for many purposes. For instance, we rely on it to carry out thefirst steps leading to advertisement recognition in document images. Furthermore, the colorsegmentation output is used to localize text areas and enhance OCR (Optical CharacterRecognition) performances. We held tests on a variety of magazine images to point up ourcontribution to the well known OCR product Abby FinerReader. We also get promising results with our ad detection system on a large set of complex layout testing images.
Document type :
Journal articles
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01353045
Contributor : Équipe Gestionnaire Des Publications Si Liris <>
Submitted on : Wednesday, August 10, 2016 - 4:20:17 PM
Last modification on : Wednesday, October 31, 2018 - 12:24:25 PM

Links full text

Identifiers

Citation

Asma Ouji, Yann Leydier, Frank Le Bourgeois. A hierarchical and scalable model for contemporary document image segmentation. Pattern Analysis and Applications, Springer Verlag, 2012, 16, pp.679-693. ⟨10.1007/s10044-012-0282-x⟩. ⟨hal-01353045⟩

Share

Metrics

Record views

228