Discrimination between digits and outliers in handwritten documents applied to the extraction of numerical fields
Résumé
In this article, we propose a numerical field extraction system from unconstrained handwritten documents. The system is based on a segmentation driven by recognition stage followed by a syntactical analysis which detects the sequences that may compose a numerical field. We focus here on the design of a digit classifier embedded in the segmentation/recognition process able to discriminate digits from outliers such as words, fragment of words, noise, etc. For that, we have developed a light classifier used as prior to a standard digit classifier in order to reject ``obvious outliers''. Several classifiers have been compared in terms of ROC curve and processing time.
Domaines
Traitement du texte et du document
Origine : Fichiers produits par l'(les) auteur(s)
Loading...