Skip to Main content Skip to Navigation
Conference papers

Scaling up Automatic Structuring of Manuscript Sales Catalogues

Abstract : Manuscript Sales Catalogues (MSC) are highly important for authenticating documents and studying the reception of authors. Their regular publication throughout Europe since the beginning of the 19th c. has consequently raised the interest around scaling up the means for automatically structuring their contents. Following successful first encoding tests with GROBID-Dictionaries [1,2] on a single MSC collection [3], we aim in this paper to present the results of more advanced tests of the system’s capacity to handle a larger corpus with MSC ofdifferent dealers, and therefore multiple layouts.
Complete list of metadata

Cited literature [7 references]  Display  Hide  Download
Contributor : Laurent Romary Connect in order to contact the contributor
Submitted on : Wednesday, August 28, 2019 - 1:32:08 PM
Last modification on : Wednesday, June 8, 2022 - 12:50:06 PM


Grobid Catalogues TEI 2019.pdf
Files produced by the author(s)


Distributed under a Creative Commons Attribution 4.0 International License


  • HAL Id : hal-02272962, version 1



Lucie Rondeau Du Noyer, Simon Gabay, Mohamed Khemakhem, Laurent Romary. Scaling up Automatic Structuring of Manuscript Sales Catalogues. TEI 2019: What is text, really? TEI and beyond, Sep 2019, Graz, Austria. ⟨hal-02272962⟩



Record views


Files downloads