Skip to Main content Skip to Navigation
Conference papers

Remplacement de mentions pour l'adaptation d'un corpus de reconnaissance d'entités nommées à un domaine cible

Abstract : Named Entity Recognition is a well-studied natural language processing task, that is useful in a number of applications. Since recently, deep-learning models are able to solve this task with good performance. However, datasets used to train and evaluate those models cover a sparse number of domains (newswire, web). As performance for a model trained on a specific domain are generally lower on another one, this implies lower performance for less covered domains. In order to fix this issue, this article proposes to use a data augmentation technique that can be used to adapt a named entity recognition corpus from a source domain to a target domain where the encountered names can be different. We apply this technique to fantasy novels, and we show that it can yield performance gains in that context.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03651510
Contributor : Arthur Amalvy Connect in order to contact the contributor
Submitted on : Wednesday, May 18, 2022 - 5:45:47 PM
Last modification on : Friday, June 24, 2022 - 4:08:16 AM

File

TALN_2022_Remplacement_de_ment...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03651510, version 2

Citation

Arthur Amalvy, Vincent Labatut, Richard Dufour. Remplacement de mentions pour l'adaptation d'un corpus de reconnaissance d'entités nommées à un domaine cible. 29ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), Jun 2022, Avignon, France. ⟨hal-03651510v2⟩

Share

Metrics

Record views

43

Files downloads

11