Hierarchical Clustering Model for Pixel-Based Classification of Document Images - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2012

Hierarchical Clustering Model for Pixel-Based Classification of Document Images

Résumé

We propose a method to learn and classify pixels in document images, e.g. to separate text from illustrations or other predefined classes. We extract texture information using a bank of Gabor filters, and learn a hierarchical clustering model that can be used as a K-Nearest Neighbour (KNN) classifier. The model has advantages over other local document image classification methods, making it efficient for real industrial applications: we do not rely on the accuracy of preprocessing steps such as binarization or segmentation, the model can be efficiently trained using zone level an- notations and it seamlessly supports multi-class classification. The output of the classification is well suited to integrate with neighbourhood regularisation methods for improvement such as relaxation labelling. We demonstrate the performances of the method on a public dataset containing complex documents from magazines and technical journals.
Fichier non déposé

Dates et versions

hal-00709249 , version 1 (18-06-2012)

Identifiants

  • HAL Id : hal-00709249 , version 1

Citer

Remi Vieux, Jean-Philippe Domenger. Hierarchical Clustering Model for Pixel-Based Classification of Document Images. International Conference on Pattern Recognition, Nov 2012, Tsukuba, Japan. ⟨hal-00709249⟩

Collections

CNRS
131 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More