AColDSS: Robust Unsupervised Automatic Color Segmentation System for Noisy Heterogeneous Document Images

Louisa Kessi 1 Frank Le Bourgeois 1 Christophe Garcia 1
1 imagine - Extraction de Caractéristiques et Identification
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
Abstract : We present the first fully automatic color analysis system suited for noisy heterogeneous documents. We developed a robust color segmentation system adapted for business documents and old handwritten document with significant color complexity and dithered background. We have developed the first fully data-driven pixel-based approach that does not need a priori information, training or manual assistance. The system achieves several operations to segment automatically color images, separate text from noise and graphics and provides color information about text color. The contribution of our work is four-fold: Firstly, it does not require any connected component analysis and simplifies the extraction of the layout and the recognition step undertaken by the OCR. Secondly, it is the usage of color morphology to simultaneously segment both text and inverted text using conditional color dilation and erosion even in cases where there are overlaps between the two. Thirdly, our system removes efficiently noise and speckles from dithered background and automatically suppresses graphical elements using geodesic measurements. Fourthly, we develop a method to splits overlapped characters and separates characters from graphics if they have different colors. The proposed Automatic Color Document Processing System has archived 99 % of correctly segmented document and has the potential to be adapted into different document images. The system outperformed the classical approach that uses binarization of the grayscale image.
Document type :
Book sections
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01272998
Contributor : Louisa Kessi <>
Submitted on : Thursday, February 11, 2016 - 4:30:40 PM
Last modification on : Tuesday, February 26, 2019 - 11:20:48 AM

Identifiers

  • HAL Id : hal-01272998, version 1

Citation

Louisa Kessi, Frank Le Bourgeois, Christophe Garcia. AColDSS: Robust Unsupervised Automatic Color Segmentation System for Noisy Heterogeneous Document Images . SCITERESS. European Project Space on Computer Vision, Graphics, Optics and Photonics Berlin, Germany March, 2015 Filipa Duarte (Ed.) , Sponsored and Organized by INSTICC Published by SCITEPRESS, 2016, European Project Space on Computer Vision, Graphics, Optics and Photonics Berlin, Germany March, 2015 Filipa Duarte (Ed.) ⟨http://www.scitepress.org/DigitalLibrary/HomePage.aspx⟩. ⟨hal-01272998⟩

Share

Metrics

Record views

219