Impact Analysis of Document Digitization on Event Extraction - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

Impact Analysis of Document Digitization on Event Extraction

Résumé

This paper tackles the epidemiological event extraction task applied to digitized documents. Event extraction is an information extraction task that focuses on identifying event mentions from textual data. In the context of event-based health surveillance from digitized documents, several key issues remain challenging in spite of great efforts. First, image documents are indexed through their digitized version and thus, they may contain numerous errors, e.g. misspellings. Second, it is important to address international news, which would imply the inclusion of multilingual data. To clarify these important aspects of how to extract epidemic-related events, it remains necessary to maximize the use of digitized data. In this paper, we investigate the impact of working with digitized multilingual documents with dierent levels of synthetic noise over the performance of an event extraction system. This type of analysis, to our knowledge, has not been alleviated in previous research.
Fichier principal
Vignette du fichier
paper28.pdf (762.79 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte

Dates et versions

hal-03026148 , version 1 (26-11-2020)

Identifiants

  • HAL Id : hal-03026148 , version 1

Citer

Nhu Khoa Nguyen, Emanuela Boroş, Gaël Lejeune, Antoine Doucet. Impact Analysis of Document Digitization on Event Extraction. 4th Workshop on Natural Language for Artificial Intelligence (NL4AI 2020) co-located with the 19th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2020), Nov 2020, Virtual, Italy. pp.17-28. ⟨hal-03026148⟩
129 Consultations
136 Téléchargements

Partager

Gmail Facebook X LinkedIn More