French Wikipedia Talk Pages: Profiling and Conflict Detection

Abstract : Wikipedia is a popular and extremely useful resource for studies in both linguistics and natural language processing (Yano and Kang, 2008; Ferschke et al., 2013). This paper introduces a new language resource based on the French Wikipedia online discussion pages, the WikiTalk corpus. The publicly available corpus includes 160M words and 3M posts structured into 1M thematic sections and has been syntactically parsed with the Talismane toolkit (Urieli, 2013). In this paper, we present the first results of experiments aiming at classifying and profiling the talk pages and threads in order to determine criteria for selecting discussions with conflicts.
Document type :
Conference papers
Complete list of metadatas

Cited literature [31 references]  Display  Hide  Download
Contributor : Lydia-Mai Ho-Dac <>
Submitted on : Monday, October 10, 2016 - 8:51:12 AM
Last modification on : Wednesday, July 10, 2019 - 1:34:22 AM
Long-term archiving on : Saturday, February 4, 2017 - 12:27:59 AM


Files produced by the author(s)


  • HAL Id : hal-01378349, version 1



Lydia-Mai Ho-Dac, Veronika Laippala, Céline Poudat, Ludovic Tanguy. French Wikipedia Talk Pages: Profiling and Conflict Detection. 4th Conference on CMC and Social Media Corpora for the Humanities, Sep 2016, Ljubljana, Slovenia. ⟨hal-01378349⟩



Record views


Files downloads