Medieval manuscripts from digitization to historical analysis
Résumé
In recent years, some technical developments have become much more mature and robust than scholars in the field of medieval studies were expecting a few years ago. Handwritten Text Recognition is a prominent example, accompanied by related functionalities, such as layout analysis and line segmentation or text and image alignment. Transkribus, eScriptorium, Arkindex are excellent tools and platforms that support manuscript analysis. Likewise, writer identification, script classification and automated dating or localizing of handwritten artefacts are progressing rapidly. This talk will highlight how these technologies can be combined with tools stemming from Natural language processing such as named entity recognition, stylometry, topic modelling, text reuse identification in order to produce new information and address historical questions. The HOME History of medieval Europe project has advanced “act segmentation” in registers and cartularies (i.e. the distinction from independent texts copied one after the other), text classification, authorship attribution, and named entity recognition directly based on the HTR output, and opens new ways to address diplomatic questions. The HORAE Hours, Recognition, analysis, edition project combines layout analysis, text identification, and hierarchical segmentation to foster our understanding of liturgical connections and of the circulation and reception of devotional texts in the late Middle Ages.