Using Function Words for Authorship Attribution: Bag-Of-Words vs. Sequential Rules

Abstract : Authorship attribution is the task of identifying the author of a given document. Various style markers have been proposed in the literature to deal with the authorship attribution task. Frequencies of function words have been shown to be very reliable and effective for this task. However, despite the fact that they are state-of-the-art, they basically rely on the invalid bag-of-words assumption, which stipulates that text is a set of independent words. In this contribution, we present a comparative study on using two different types of style marker based on function words for authorship attribution. We compare the effectiveness of using sequential rules of function words as style marker that do not relay on the bag-of-words assumption to that of the frequency of function words which does. Our results show that the frequencies of function words outperform the sequential rules.
Complete list of metadatas

Cited literature [22 references]  Display  Hide  Download

https://hal.sorbonne-universite.fr/hal-01198407
Contributor : Mohamed Amine Boukhaled <>
Submitted on : Saturday, September 12, 2015 - 2:06:57 PM
Last modification on : Thursday, March 21, 2019 - 2:44:48 PM
Long-term archiving on : Tuesday, December 29, 2015 - 12:53:46 AM

File

Authorship_identification_BOUK...
Files produced by the author(s)

Identifiers

Citation

Mohamed Amine Boukhaled, Jean-Gabriel Ganascia. Using Function Words for Authorship Attribution: Bag-Of-Words vs. Sequential Rules. The 11th International Workshop on Natural Language Processing and Cognitive Science, Oct 2014, Venice, Italy. pp.115-122, ⟨10.1515/9781501501289.115⟩. ⟨hal-01198407⟩

Share

Metrics

Record views

194

Files downloads

290