Skip to Main content Skip to Navigation
Conference papers

A stylometric approach for opinion mining

Gaël Lejeune 1 Frédéric Dumonceaux 2
1 TALN
LINA - Laboratoire d'Informatique de Nantes Atlantique
2 DUKE
LINA - Laboratoire d'Informatique de Nantes Atlantique
Abstract : This article tries to tackle the DEFT'15 opinion mining challenge using a stylometric approach. The dataset proposed by the organizers was a set of microblog messages extracted from Twitter. We participated in three tasks : classification according to polarity (Task 1, 3 classes), classification according to information (Task 2.1, 4 classes) and classification according to specific classes (Task 3, 18 classes). The stylometric approach we used was based on recent work on Autor-ship Attribution using character n-grams as features. Our assumption was that the features efficient for characterizing an author style would be efficient as well for identifying the opinions or emotions expressed in tweets. We showed that this assumption was wrong, especially on task 3. It appears that the stylometric features might not be well suited for opinion mining tasks. Another hypothesis to explain this result is that the length of the microblog messages might be too small to take advantage of such a stylometric approach.
Document type :
Conference papers
Complete list of metadata

Cited literature [8 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01170000
Contributor : Gaël Lejeune Connect in order to contact the contributor
Submitted on : Tuesday, June 30, 2015 - 4:30:56 PM
Last modification on : Thursday, January 20, 2022 - 5:28:06 PM
Long-term archiving on: : Tuesday, April 25, 2017 - 8:32:05 PM

File

deft2015_dimeco.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01170000, version 1

Collections

Citation

Gaël Lejeune, Frédéric Dumonceaux. A stylometric approach for opinion mining. Traitement Automatique des Langues Naturelles 2015, DEFT, Jun 2015, caen, France. ⟨hal-01170000⟩

Share

Metrics

Les métriques sont temporairement indisponibles