Skip to Main content Skip to Navigation
Conference papers

ToNy: Contextual embeddings for accurate multilingual discourse segmentation of full documents

Philippe Muller 1 Chloé Braud 2, 3 Mathieu Morey 4
1 IRIT-MELODI - MEthodes et ingénierie des Langues, des Ontologies et du DIscours
IRIT - Institut de recherche en informatique de Toulouse
2 SYNALP - Natural Language Processing : representations, inference and semantics
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
4 SEMAGRAMME - Semantic Analysis of Natural Language
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Segmentation is the first step in building practical discourse parsers, and is often neglected in discourse parsing studies. The goal is to identify the minimal spans of text to be linked by discourse relations, or to isolate explicit marking of discourse relations. Existing systems on English report F1 scores as high as 95%, but they generally assume gold sentence boundaries and are restricted to En-glish newswire texts annotated within the RST framework. This article presents a generic approach and a system, ToNy, a discourse segmenter developed for the DisRPT shared task where multiple discourse representation schemes, languages and domains are represented. In our experiments, we found that a straightforward sequence prediction architecture with pretrained contextual embeddings is sufficient to reach performance levels comparable to existing systems, when separately trained on each corpus. We report performance between 81% and 96% in F1 score. We also observed that discourse segmentation models only display a moderate generalization capability, even within the same language and discourse representation scheme.
Document type :
Conference papers
Complete list of metadatas

Cited literature [41 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02374091
Contributor : Chloé Braud <>
Submitted on : Thursday, November 21, 2019 - 1:43:18 PM
Last modification on : Tuesday, September 8, 2020 - 10:16:03 AM

File

21_Paper.pdf
Files produced by the author(s)

Identifiers

Citation

Philippe Muller, Chloé Braud, Mathieu Morey. ToNy: Contextual embeddings for accurate multilingual discourse segmentation of full documents. Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019, Jun 2019, Minneapolis, United States. pp.115-124, ⟨10.18653/v1/W19-2715⟩. ⟨hal-02374091⟩

Share

Metrics

Record views

390

Files downloads

1299