Books of Hours: the First Liturgical Corpus for Text Segmentation - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

Books of Hours: the First Liturgical Corpus for Text Segmentation

Résumé

The Book of Hours was the bestseller of the late Middle Ages and Renaissance. It is a historical invaluable treasure, documentingthe devotional practices of Christians in the late Middle Ages. Up to now, its textual content has been scarcely studied because of itsmanuscript nature, its length and its complex content. At first glance, it looks too standardized. However, the study of book of hoursraises important challenges: (i) in image analysis, its often lavish ornamentation (illegible painted initials, line-fillers, etc.), abbreviatedwords, multilingualism are difficult to address in Handwritten Text Recognition (HTR); (ii) its hierarchical entangled structure offers anew field of investigation for text segmentation; (iii) in digital humanities, its textual content gives opportunities for historical analysis.In this paper, we provide the first corpus of books of hours, which consists of Latin transcriptions of 300 books of hours generated byHandwritten Text Recognition (HTR) - that is like Optical Character Recognition (OCR) but for handwritten and not printed texts. Wedesigned a structural scheme of the book of hours and annotated manually two books of hours according to this scheme. Lastly, weperformed a systematic evaluation of the main state of the art text segmentation approache
Fichier principal
Vignette du fichier
2020.lrec-1.97.pdf (2.21 Mo) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Loading...

Dates et versions

hal-02931294 , version 1 (05-09-2020)

Identifiants

  • HAL Id : hal-02931294 , version 1

Citer

Amir Hazem, Béatrice Daille, Marie-Laurence Bonhomme, Martin Maarand, Mélodie Boillet, et al.. Books of Hours: the First Liturgical Corpus for Text Segmentation. 12th Language Resources and Evaluation Conference, May 2020, Marseille (Virtual), France. pp.776-784. ⟨hal-02931294⟩
264 Consultations
65 Téléchargements

Partager

Gmail Facebook X LinkedIn More