Skip to Main content Skip to Navigation
New interface
Conference papers

An empirical resource for discovering cognitive principles of discourse organisation: the ANNODIS corpus

Abstract : This paper describes the ANNODIS resource, a discourse-level annotated corpus for French. The corpus combines two perspectives on discourse: a bottom-up approach and a top-down approach. The bottom-up view incrementally builds a structure from elementary discourse units, while the top-down view focuses on the selective annotation of multi-level discourse structures. The corpus is composed of texts that are diversified with respect to genre, length and type of discursive organisation. The methodology followed here involves an iterative design of annotation guidelines in order to reach satisfactory inter-annotator agreement levels. This allows us to raise a few issues relevant for the comparison of such complex objects as discourse structures. The corpus also serves as a source of empirical evidence for discourse theories. We present here two first analyses taking advantage of this new annotated corpus --one that tested hypotheses on constraints governing discourse structure, and another that studied the variations in composition and signalling of multi-level discourse structures.
Complete list of metadata

Cited literature [25 references]  Display  Hide  Download
Contributor : Lydia-Mai Ho-Dac Connect in order to contact the contributor
Submitted on : Wednesday, April 9, 2014 - 4:04:20 PM
Last modification on : Wednesday, September 28, 2022 - 4:20:10 PM
Long-term archiving on: : Wednesday, July 9, 2014 - 12:40:10 PM


Files produced by the author(s)


  • HAL Id : hal-00976087, version 1


Stergos Afantenos, Nicholas Asher, Farah Benamara, Myriam Bras, Cécile Fabre, et al.. An empirical resource for discovering cognitive principles of discourse organisation: the ANNODIS corpus. Eight International Conference on Language Resources and Evaluation (LREC 2012), European Language Resources Association (ELRA); Evaluation and Language resources Distribution Agency (ELDA); Istituto di Linguistica Computazionale (ILC), May 2012, Istanbul, Turkey. pp.2727-2734. ⟨hal-00976087⟩



Record views


Files downloads