French coreference for spoken and written language

Rodrigo Wilkens; Bruno Oberle; Frédéric Landragin; Amalia Todirascu

Communication Dans Un Congrès Année : 2020

French coreference for spoken and written language

(1) , (2) , (3) , (4)

1
2
3
4

Rodrigo Wilkens

Fonction : Auteur

Instituto de Informática [Porto Alegre]

Bruno Oberle

Fonction : Auteur
PersonId : 1064737

Linguistique, Langues et Parole

Frédéric Landragin

Fonction : Auteur
PersonId : 5570
IdHAL : frederic-landragin
IdRef : 071347321

Lattice - Langues, Textes, Traitements informatiques, Cognition - UMR 8094

Amalia Todirascu

Fonction : Auteur
PersonId : 5726
IdHAL : amalia-todirascu
IdRef : 130431796

Fonctionnement Discursif et Traduction (LILPA)

Résumé

Coreference resolution aims at identifying and grouping all mentions referring to the same entity. In French, most systems run different setups, making their comparison difficult. In this paper, we present an extensive comparison of several coreference resolution systems for French. The systems have been trained on two corpora (ANCOR for spoken language and Democrat for written language) annotated with coreference chains, and augmented with syntactic and semantic information. The models are compared with different configurations (e.g. with and without singletons). In addition, we evaluate mention detection and coreference resolution apart. We present a full-stack model that outperforms the other approaches. This model allows us to study the impact of mention detection errors on coreference resolution. Our analysis shows that mention detection can be improved focusing on boundary identification while advances in the pronoun-noun relation detection can aid the coreference task. Another contribution of this work is the first end-to-end neural French coreference resolution model trained on Democrat (written texts), which compares to the state-of-the-art systems for oral French.

Mots clés

Coreference chains Coreference Resolution Evaluation Anaphora Automatic coreference resolution Coreference Automatic Coreference Resolution French language

Domaines

Linguistique Traitement du texte et du document

Fichier principal

2020.lrec-1.10.pdf (400.02 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

BRUNO OBERLE : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02476902

Soumis le : jeudi 4 juin 2020-10:07:18

Dernière modification le : vendredi 19 avril 2024-16:18:57

Archivage à long terme le : vendredi 4 décembre 2020-21:49:59

Dates et versions

hal-02476902 , version 1 (04-06-2020)

Identifiants

HAL Id : hal-02476902 , version 1

Citer

Rodrigo Wilkens, Bruno Oberle, Frédéric Landragin, Amalia Todirascu. French coreference for spoken and written language. Language Resources and Evaluation Conference (LREC 2020), 2020, Marseille, France. pp.80-89. ⟨hal-02476902⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS CNRS UNIV-PARIS3 LATTICE PSL USPC SITE-ALSACE DEMOCRAT ANR

426 Consultations

310 Téléchargements

French coreference for spoken and written language

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager