Hybrid OCR combination for ancient documents

Hubert Cecotti 1 Abdel Belaïd 1
1 READ - READ
LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : Commercial Optical Character Recognition (OCR) have at lot improved in the last few years. Their outstanding ability to process different kinds of documents is their main quality. However, their generality can also be an issue, as they cannot recognize perfectly documents far from the average present-day documents. We propose in this paper a system combining several OCRs and a specialized ICR (Intelligent Character Recognition) based on a convolutional neural network to complement them. Instead of just performing several OCRs in parallel and applying a fusing rule on the results, a specialized neural network with an adaptive topology is added to complement the OCRs, in function of the OCRs errors. This system has been tested on ancient documents containing old characters and old fonts not used in contemporary documents. The OCRs combination increases the recognition of about 3\% whereas the ICR improves the recognition of rejected characters of more than 5\%.
Complete list of metadatas

https://hal.inria.fr/inria-00000366
Contributor : Hubert Cecotti <>
Submitted on : Tuesday, September 27, 2005 - 4:02:02 PM
Last modification on : Tuesday, April 24, 2018 - 1:30:57 PM

Identifiers

Collections

Citation

Hubert Cecotti, Abdel Belaïd. Hybrid OCR combination for ancient documents. Third International Conference on Advances in Pattern Recognition - ICAPR 2005, Aug 2005, Bath/UK, pp.646-653, ⟨10.1007/11551188⟩. ⟨inria-00000366⟩

Share

Metrics

Record views

119