ToNy: Contextual embeddings for accurate multilingual discourse segmentation of full documents - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2019

ToNy: Contextual embeddings for accurate multilingual discourse segmentation of full documents

Résumé

Segmentation is the first step in building practical discourse parsers, and is often neglected in discourse parsing studies. The goal is to identify the minimal spans of text to be linked by discourse relations, or to isolate explicit marking of discourse relations. Existing systems on English report F1 scores as high as 95%, but they generally assume gold sentence boundaries and are restricted to En-glish newswire texts annotated within the RST framework. This article presents a generic approach and a system, ToNy, a discourse segmenter developed for the DisRPT shared task where multiple discourse representation schemes, languages and domains are represented. In our experiments, we found that a straightforward sequence prediction architecture with pretrained contextual embeddings is sufficient to reach performance levels comparable to existing systems, when separately trained on each corpus. We report performance between 81% and 96% in F1 score. We also observed that discourse segmentation models only display a moderate generalization capability, even within the same language and discourse representation scheme.
Fichier principal
Vignette du fichier
21_Paper.pdf (131.14 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02374091 , version 1 (21-11-2019)

Identifiants

Citer

Philippe Muller, Chloé Braud, Mathieu Morey. ToNy: Contextual embeddings for accurate multilingual discourse segmentation of full documents. Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019, Jun 2019, Minneapolis, United States. pp.115-124, ⟨10.18653/v1/W19-2715⟩. ⟨hal-02374091⟩
261 Consultations
740 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More