Feature selection using Principal Component Analysis for massive retweet detection

Abstract : Social networks become a major actor in massive information propagation. In the context of the Twitter platform, its popularity is due in part to the capability of relaying messages (i.e. tweets) posted by users. This particular mechanism, called retweet, allows users to massively share tweets they consider as potentially interesting for others. In this paper, we propose to study the behavior of tweets that have been massively retweeted in a short period of time. We first analyze specific tweet features through a Principal Component Analysis (PCA) to better understand the behavior of highly forwarded tweets as opposed to those retweeted only a few times. Finally, we propose to automatically detect the massively retweeted messages. The qualitative study is used to select the features allowing the best classification performance. We show that the selection of only the most correlated features, leads to the best classification accuracy (F-measure of 65.7%), with a gain of about 2.4 points in comparison to the use of the complete set of features.
Document type :
Journal articles
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01319767
Contributor : Bibliothèque Universitaire Déposants Hal-Avignon <>
Submitted on : Monday, May 23, 2016 - 8:34:34 AM
Last modification on : Saturday, March 23, 2019 - 1:22:39 AM

Identifiers

Collections

Citation

Mohamed Morchid, Richard Dufour, Pierre-Michel Bousquet, Georges Linares, Juan-Manuel Torres-Moreno. Feature selection using Principal Component Analysis for massive retweet detection. Pattern Recognition Letters, Elsevier, 2014, ⟨10.1016/j.patrec.2014.05.020⟩. ⟨hal-01319767⟩

Share

Metrics

Record views

180