Feature selection using Principal Component Analysis for massive retweet detection - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Pattern Recognition Letters Année : 2014

Feature selection using Principal Component Analysis for massive retweet detection

Mohamed Morchid
Richard Dufour
Pierre-Michel Bousquet
  • Fonction : Auteur
  • PersonId : 774869
  • IdRef : 182268500
Georges Linares

Résumé

Social networks become a major actor in massive information propagation. In the context of the Twitter platform, its popularity is due in part to the capability of relaying messages (i.e. tweets) posted by users. This particular mechanism, called retweet, allows users to massively share tweets they consider as potentially interesting for others. In this paper, we propose to study the behavior of tweets that have been massively retweeted in a short period of time. We first analyze specific tweet features through a Principal Component Analysis (PCA) to better understand the behavior of highly forwarded tweets as opposed to those retweeted only a few times. Finally, we propose to automatically detect the massively retweeted messages. The qualitative study is used to select the features allowing the best classification performance. We show that the selection of only the most correlated features, leads to the best classification accuracy (F-measure of 65.7%), with a gain of about 2.4 points in comparison to the use of the complete set of features.
Fichier non déposé

Dates et versions

hal-01319767 , version 1 (23-05-2016)

Identifiants

Citer

Mohamed Morchid, Richard Dufour, Pierre-Michel Bousquet, Georges Linares, Juan-Manuel Torres-Moreno. Feature selection using Principal Component Analysis for massive retweet detection. Pattern Recognition Letters, 2014, ⟨10.1016/j.patrec.2014.05.020⟩. ⟨hal-01319767⟩

Collections

UNIV-AVIGNON LIA
192 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More