Skip to Main content Skip to Navigation
Conference papers

BERT meets d'Artagnan: Data Augmentation for Robust Character Detection in Novels

Abstract : Character detection is a task of interest in digital humanities that requires solving multiple natural language processing subtasks such as named entity recognition (NER). While recent deep-learning based models can solve the NER task accurately, most datasets do not cover the literary domain, which leads to lower performance and specific issues for literary texts. In this work, we investigate the use of a BERT model in literary NER and observe that it leads to less errors than previously surveyed models. We further propose to use a simple data augmentation scheme to adapt the classic newswire corpus CoNLL-2003 to the literary domain, fixing some errors and increasing the recall of the model trained on the augmented version of the dataset.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03617722
Contributor : Arthur Amalvy Connect in order to contact the contributor
Submitted on : Wednesday, June 22, 2022 - 4:25:15 PM
Last modification on : Friday, August 5, 2022 - 2:54:52 PM

Identifiers

  • HAL Id : hal-03617722, version 2

Citation

Arthur Amalvy, Vincent Labatut, Richard Dufour. BERT meets d'Artagnan: Data Augmentation for Robust Character Detection in Novels. Workshop on Computational Methods in the Humanities (COMHUM), Jun 2022, Lausanne, Switzerland. ⟨hal-03617722v2⟩

Share

Metrics

Record views

95

Files downloads

49