A Categorization System for Handwritten Documents
Résumé
This paper presents a complete system able to categorize handwritten documents, i.e. to classify documents according to their topic. The categorization approach is based on the detection of some discrim- inative keywords prior to the use of the well known tf-idf representation for document categorization. Two keyword extraction strategies are explored. The rst one proceeds to the recognition of the whole document. However, the performance of this strategy strongly de- creases when the lexicon size increases. The second strat- egy only extracts the discriminative keywords in the handwritten documents. This information extraction strategy relies on the integration of a rejection model (or anti-lexicon model) in the recognition system. Ex- periments have been carried out on an unconstrained handwritten document database coming from an indus- trial application concerning the processing of incoming mails. Results show that the discriminative keyword ex- traction system leads to better recall/precision trade- o s than the full recognition strategy. The keyword ex- traction strategy also outperforms the full recognition strategy for the categorization task.
Origine : Fichiers produits par l'(les) auteur(s)
Loading...