Between automatic and manual encoding - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2022

Between automatic and manual encoding

Résumé

Cultural heritage institutions today aim to digitise their collections of prints and manuscripts (Bermès 2020) and are generating more and more digital images (Gray 2009). To enrich these images, many institutions work with standardised formats such as IIIF, preserving as much of the source’s information as possible. To take full advantage of textual documents, an image alone is not enough. Thanks to automatic text recognition technology, it is now possible to extract images’ content on a large scale. The TEI seems to provide the perfect format to capture both an image’s formal and textual data (Janès et al. 2021). However, this poses a problem. To ensure compatibility with a range of use cases, TEI XML files must guarantee IIIF or RDF exports and therefore must be based on strict data structures that can be automated. But a rigid structure contradicts the basic principles of philology, which require maximum flexibility to cope with various situations. The solution proposed by the Gallic(orpor)a project1 attempted to deal with such a contradiction, focusing on French historical documents produced between the 15th and the 18th c. It aims to enrich the digital facsimiles distributed by the French National Library (BnF).
TEI_CFP-3.pdf (289.69 Ko) Télécharger le fichier
TEI_CFP-f.pdf (34.31 Mo) Télécharger le fichier
Format : Présentation
Licence : CC BY - Paternité
Commentaire : Présentation du projet
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03780302 , version 1 (19-09-2022)

Licence

Paternité - Pas d'utilisation commerciale - Partage selon les Conditions Initiales

Identifiants

Citer

Ariane Pinche, Kelly Christensen, Simon Gabay. Between automatic and manual encoding: Towards a generic TEI model for historical prints and manuscripts. TEI 2022 conference : Text as data, Sep 2022, Newcastle, United Kingdom. ⟨10.5281/zenodo.7092214⟩. ⟨hal-03780302⟩
159 Consultations
77 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More