Skip to Main content Skip to Navigation
Journal articles

Supervised learning for the detection of negation and of its scope in French and Brazilian Portuguese biomedical corpora

Abstract : Automatic detection of negated content is often a prerequisite in information extraction systems in various domains. In the biomedical domain especially, this task is important because negation plays an important role. In this work, two main contributions are proposed. First, we work with languages which have been poorly addressed up to now: Brazilian Portuguese and French. Thus, we developed new corpora for these two languages which have been manually annotated for marking up the negation cues and their scope. Second, we propose automatic methods based on supervised machine learning approaches for the automatic detection of negation marks and of their scopes. The methods show to be robust in both languages (Brazilian Portuguese and French) and in cross-domain (general and biomedical languages) contexts. The approach is also validated on English data from the state of the art: it yields very good results and outperforms other existing approaches. Besides, the application is accessible and usable online. We assume that, through these issues (new annotated corpora, application accessible online, and cross-domain robustness), the reproducibility of the results and the robustness of the NLP applications will be augmented.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03021033
Contributor : Vincent Claveau Connect in order to contact the contributor
Submitted on : Tuesday, November 24, 2020 - 10:32:20 AM
Last modification on : Wednesday, November 3, 2021 - 8:12:45 AM
Long-term archiving on: : Thursday, February 25, 2021 - 7:13:11 PM

File

supervised_learning_for_the_de...
Files produced by the author(s)

Identifiers

Citation

Clément Dalloux, Vincent Claveau, Natalia Grabar, Lucas Oliveira, Claudia Cabral Moro, et al.. Supervised learning for the detection of negation and of its scope in French and Brazilian Portuguese biomedical corpora. Natural Language Engineering, Cambridge University Press (CUP), 2020, ⟨10.1017/S1351324920000352⟩. ⟨hal-03021033⟩

Share

Metrics

Record views

90

Files downloads

143