A New Database of Digits Extracted from Coins with Hard-to-Segment Foreground for OCR Evaluation

Xingyu Pan 1 Laure Tougne 1
1 imagine - Extraction de Caractéristiques et Identification
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
Abstract : Since the release date struck on a coin is important information of its monetary type, recognition of extracted digits may assist in identification of monetary types. However, digit images extracted from coins are challenging for conventional optical character recognition (OCR) methods because the foreground of such digits has very often the same color as their background. In addition, other noises, including the wear of coin metal, make it more difficult to obtain a correct segmentation of the character shape. To address those challenges, this paper presents the CoinNUMS database for automatic digit recognition. The database CoinNUMS, containing 3006 digit images, is divided into three subsets. The first subset CoinNUMS_geni consists of 606 digit images manually cropped from high-resolution photos of well-conserved coins from GENI coin photos; the second subset CoinNUMS_pcgs_a consists of 1200 digit images automatically extracted from a subset of the USA_Grading numismatic database containing coins in different quality; the last subset CoinNUMS_pcgs_m consists of 1200 digit images manually extracted from the same coin photos as CoinNUMS_pcgs_a. In CoinNUMS_pcgs_a and CoinNUMS_pcgs_m, the digit images are extracted from the release date. In CoinNUMS_geni, the digit images can come from the cropped date, the face value or any other legends containing digits in the coin. To show the difficulty of these databases, we have tested recognition algorithms of the literature. The database and the results of the tested algorithms will be freely available on a dedicated website .
Type de document :
Article dans une revue
Frontiers in information and communication technologies, Frontiers Media S.A., 2017, <10.3389/fict.2017.00009>
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01518293
Contributeur : Xingyu Pan <>
Soumis le : jeudi 4 mai 2017 - 14:19:40
Dernière modification le : vendredi 5 mai 2017 - 09:10:03

Identifiants

Collections

Citation

Xingyu Pan, Laure Tougne. A New Database of Digits Extracted from Coins with Hard-to-Segment Foreground for OCR Evaluation. Frontiers in information and communication technologies, Frontiers Media S.A., 2017, <10.3389/fict.2017.00009>. <hal-01518293>

Partager

Métriques

Consultations de la notice

42