ANCOR_Centre, a Large Free Spoken French Coreference Corpus:  description of the Resource and Reliability Measures

This article presents ANCOR_Centre, a French coreference corpus, available under the Creative Commons Licence. With a size of around 500,000 words, the corpus is large enough to serve the needs of data-driven approaches in NLP and represents one of the largest coreference resources currently available. The corpus focuses exclusively on spoken language, it aims at representing a certain variety of spoken genders. ANCOR_Centre includes anaphora as well as coreference relations which involve nominal and pronominal mentions. The paper describes into details the annotation scheme and the reliability measures computed on the resource.

Mots clés

French spoken language free annotated corpus coreference anaphora

Domaines

Informatique et langage [cs.CL]

Fichier principal

2014_LREC_ANCOR.pdf (263.46 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Jean-Yves Antoine : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01075679

Soumis le : dimanche 19 octobre 2014-15:57:57

Dernière modification le : vendredi 16 février 2024-18:16:04

Archivage à long terme le : mardi 20 janvier 2015-10:44:20

Dates et versions

hal-01075679 , version 1 (19-10-2014)

Identifiants

HAL Id : hal-01075679 , version 1

Citer

Judith Muzerelle, Anaïs Lefeuvre, Emmanuel Schang, Jean-Yves Antoine, Aurore Pelletier, et al.. ANCOR_Centre, a Large Free Spoken French Coreference Corpus: description of the Resource and Reliability Measures. LREC'2014, 9th Language Resources and Evaluation Conference., May 2014, Reyjavik, Iceland. pp.843-847. ⟨hal-01075679⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM EC-PARIS UNIV-RENNES1 UNIV-TOURS CNRS INRIA UNIV-ORLEANS INSA-RENNES IRISA MSL MSL-THESE UBS IRISA_UBS IRISA-D6 LIBDTLN LLL UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES LIFAT INSA-GROUPE INSA-CVL UR1-MATH-NUM

650 Consultations

657 Téléchargements