French Wikipedia Talk Pages: Profiling and Conflict Detection - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2016

French Wikipedia Talk Pages: Profiling and Conflict Detection

Résumé

Wikipedia is a popular and extremely useful resource for studies in both linguistics and natural language processing (Yano and Kang, 2008; Ferschke et al., 2013). This paper introduces a new language resource based on the French Wikipedia online discussion pages, the WikiTalk corpus. The publicly available corpus includes 160M words and 3M posts structured into 1M thematic sections and has been syntactically parsed with the Talismane toolkit (Urieli, 2013). In this paper, we present the first results of experiments aiming at classifying and profiling the talk pages and threads in order to determine criteria for selecting discussions with conflicts.

Domaines

Linguistique
Fichier principal
Vignette du fichier
HodacLaippalaPoudatTanguy_vf.pdf (152.62 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01378349 , version 1 (10-10-2016)

Identifiants

  • HAL Id : hal-01378349 , version 1

Citer

Lydia-Mai Ho-Dac, Veronika Laippala, Céline Poudat, Ludovic Tanguy. French Wikipedia Talk Pages: Profiling and Conflict Detection. 4th Conference on CMC and Social Media Corpora for the Humanities, Sep 2016, Ljubljana, Slovenia. ⟨hal-01378349⟩
239 Consultations
263 Téléchargements

Partager

Gmail Facebook X LinkedIn More