Skip to Main content Skip to Navigation
Conference papers

Corpus annotation of macro discourse structures

Abstract : We present our discourse annotation project, Annodis, which aims to make available a diversified French corpus annotated with discourse information, along with a set of tools for annotation and corpus exploitation. An original aspect of the project is that it combines two theoretically and methodologically different points of view on discourse: bottom-up and top-down. In the bottom-up perspective, basic constituents are identified and linked via discourse relations. In a complementary manner, the top-down approach starts from the text as a whole and focuses on the identification of configurations of cues signalling higher-level text segments, in an attempt to address the interplay of continuity and discontinuity within discourse. The focus of this paper is the annotation scheme used in the top-down approach, which revolves around enumerative structures. These structures, which are of particular interest to our project because of their ability to occur in nested configurations and at all levels of granularity (from within a sentence to across text sections), are the discourse object chosen to "bootstrap" our approach. We describe the different stages involved: corpus selection, pre-processing and "marking" techniques, and the specific interface facilities, designed to make it possible for coders to navigate and scan the text in order to identify relevant spans at different granularity levels.
Document type :
Conference papers
Complete list of metadatas

Cited literature [11 references]  Display  Hide  Download
Contributor : Lydia-Mai Ho-Dac <>
Submitted on : Wednesday, April 9, 2014 - 5:23:43 PM
Last modification on : Friday, September 18, 2020 - 2:34:32 PM
Long-term archiving on: : Wednesday, July 9, 2014 - 2:30:35 PM


Files produced by the author(s)


  • HAL Id : hal-00976352, version 1


Lydia-Mai Ho-Dac, Cécile Fabre, Marie-Paule Péry-Woodley, Josette Rebeyrolle. Corpus annotation of macro discourse structures. 1st International conference on corpus linguistics (CILC-09), May 2009, Murcia, Spain. ⟨hal-00976352⟩



Record views


Files downloads