Skip to Main content Skip to Navigation
Conference papers

Dating Ancient texts: an Approach for Noisy French Documents

Abstract : Automatic dating of ancient documents is a very important area of research for digital humanities applications. Many documents available via digital libraries do not have any dating or dating that is uncertain. Document dating is not only useful by itself but it also helps to choose the appropriate NLP tools (lemmatizer, POS tagger. . .) for subsequent analysis. This paper provides a dataset with thousands of ancient documents in French and present methods and evaluation metrics for this task. We compare character-level methods with token-level methods on two different datasets of two different time periods and two different text genres. Our results show that character-level models are more robust to noise than classical token-level models. The experiments presented in this article focused on documents written in French but we believe that the ability of character-level models to handle noise properly would help to achieve comparable results on other languages and more ancient languages in particular.
Document type :
Conference papers
Complete list of metadata

Cited literature [22 references]  Display  Hide  Download
Contributor : Gaël Lejeune Connect in order to contact the contributor
Submitted on : Wednesday, May 13, 2020 - 7:55:53 AM
Last modification on : Thursday, December 9, 2021 - 3:48:14 AM


Files produced by the author(s)


  • HAL Id : hal-02571633, version 1


Anaëlle Baledent, Nicolas Hiebel, Gaël Lejeune. Dating Ancient texts: an Approach for Noisy French Documents. Language Resources and Evaluation Conference (LREC) 2020, May 2020, Marseille, France. ⟨hal-02571633⟩



Record views


Files downloads