A Categorization System for Handwritten Documents - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue International Journal on Document Analysis and Recognition Année : 2012

A Categorization System for Handwritten Documents

Résumé

This paper presents a complete system able to categorize handwritten documents, i.e. to classify documents according to their topic. The categorization approach is based on the detection of some discrim- inative keywords prior to the use of the well known tf-idf representation for document categorization. Two keyword extraction strategies are explored. The rst one proceeds to the recognition of the whole document. However, the performance of this strategy strongly de- creases when the lexicon size increases. The second strat- egy only extracts the discriminative keywords in the handwritten documents. This information extraction strategy relies on the integration of a rejection model (or anti-lexicon model) in the recognition system. Ex- periments have been carried out on an unconstrained handwritten document database coming from an indus- trial application concerning the processing of incoming mails. Results show that the discriminative keyword ex- traction system leads to better recall/precision trade- o s than the full recognition strategy. The keyword ex- traction strategy also outperforms the full recognition strategy for the categorization task.
Fichier principal
Vignette du fichier
ijdar2012.pdf (735.17 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00778177 , version 1 (18-01-2013)

Identifiants

  • HAL Id : hal-00778177 , version 1

Citer

T. Paquet, L. Heutte, G. Koch, C. Chatelain. A Categorization System for Handwritten Documents. International Journal on Document Analysis and Recognition, 2012, 15 (4), pp.315-330. ⟨hal-00778177⟩
103 Consultations
242 Téléchargements

Partager

Gmail Facebook X LinkedIn More