Skip to Main content Skip to Navigation
Poster communications

Un corpus libre, évolutif et versionné en entités nommées du français

Abstract : A free, evolving and versioned french named entity recognition corpus. Annotated corpora are very hard resources to make because of the high human cost they imply. Once released, they are hardly modifiable and tend to not evolve through time. In this article we present a free and evolving corpus annotated in named entity recognition based on French Wikinews articles from 2016 to 2018, for a total of 1191 articles. We will briefly describe the annotation guidelines before comparing our corpus to various corpora of comparable nature. We will also give an intra-annotator-agreement to provide an estimation of the stability of the annotation as well as the overall process to develop the corpus.
Complete list of metadata

Cited literature [33 references]  Display  Hide  Download
Contributor : Yoann Dupont <>
Submitted on : Wednesday, January 22, 2020 - 1:31:29 PM
Last modification on : Tuesday, June 30, 2020 - 3:41:00 AM
Long-term archiving on: : Thursday, April 23, 2020 - 4:16:02 PM


Files produced by the author(s)


  • HAL Id : hal-02448590, version 1


Yoann Dupont. Un corpus libre, évolutif et versionné en entités nommées du français. TALN 2019 - Traitement Automatique des Langues Naturelles, Jul 2019, Toulouse, France. ⟨hal-02448590⟩



Record views


Files downloads