Towards historical document indexing : extraction of drop cap letters

Mickaël Coustaty; Rudolf Pareti; Nicole Vincent; Jean-Marc Ogier

doi:10.1007/s10032-011-0152-x

Article Dans Une Revue International Journal on Document Analysis and Recognition Année : 2011

Towards historical document indexing : extraction of drop cap letters

(1) , (2) , (2) , (1)

1
2

Mickaël Coustaty

Fonction : Auteur
PersonId : 2462
IdHAL : mickael-coustaty
ORCID : 0000-0002-0123-439X
IdRef : 160560268

Laboratoire Informatique, Image et Interaction - EA 2118

Rudolf Pareti

Fonction : Auteur
PersonId : 835855

Centre de Recherche en Informatique de Paris 5

Nicole Vincent

Fonction : Auteur
PersonId : 835856

Centre de Recherche en Informatique de Paris 5

Jean-Marc Ogier

Fonction : Auteur

Laboratoire Informatique, Image et Interaction - EA 2118

Résumé

This paper deals with the difficult problem of indexing ancient graphic images. It tackles the particular case of indexing drop caps (also called Lettrines) and specifically, considers the problem of letter extraction from this complex graphic images. Based on an analysis of the features of the images to be indexed, an original strategy is proposed. This approach relies on filtering the relevant information, on the basis of Meyer decomposition. Then, in order to accommodate the variability of representation of the information, a Zipf's law modeling enables detection of the regions belonging to the letter, what allows it to be segmented. The overall process is evaluated using a relevant set of images, which shows the relevance of the approach.

Domaines

Traitement des images [eess.IV] Traitement du texte et du document Vision par ordinateur et reconnaissance de formes [cs.CV]

Fichier principal

ijdar_V3_accepte.pdf (2.11 Mo)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Mickaël Coustaty : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00916007

Soumis le : lundi 9 décembre 2013-15:47:11

Dernière modification le : jeudi 12 mai 2022-15:37:35

Archivage à long terme le : dimanche 9 mars 2014-23:50:57

Dates et versions

hal-00916007 , version 1 (09-12-2013)

Identifiants

HAL Id : hal-00916007 , version 1
DOI : 10.1007/s10032-011-0152-x

Citer

Mickaël Coustaty, Rudolf Pareti, Nicole Vincent, Jean-Marc Ogier. Towards historical document indexing : extraction of drop cap letters. International Journal on Document Analysis and Recognition, 2011, 14 (3), pp.243-254. ⟨10.1007/s10032-011-0152-x⟩. ⟨hal-00916007⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-ROCHELLE ANR

43 Consultations

319 Téléchargements

Towards historical document indexing : extraction of drop cap letters

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager