Feature selection using Principal Component Analysis for massive retweet detection

Social networks become a major actor in massive information propagation. In the context of the Twitter platform, its popularity is due in part to the capability of relaying messages (i.e. tweets) posted by users. This particular mechanism, called retweet, allows users to massively share tweets they consider as potentially interesting for others. In this paper, we propose to study the behavior of tweets that have been massively retweeted in a short period of time. We first analyze specific tweet features through a Principal Component Analysis (PCA) to better understand the behavior of highly forwarded tweets as opposed to those retweeted only a few times. Finally, we propose to automatically detect the massively retweeted messages. The qualitative study is used to select the features allowing the best classification performance. We show that the selection of only the most correlated features, leads to the best classification accuracy (F-measure of 65.7%), with a gain of about 2.4 points in comparison to the use of the complete set of features.

Domaines

Informatique [cs]

bibliothèque Universitaire Déposants HAL-Avignon : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01319767

Soumis le : lundi 23 mai 2016-08:34:34

Dernière modification le : vendredi 12 novembre 2021-11:18:05

Dates et versions

hal-01319767 , version 1 (23-05-2016)

Identifiants

HAL Id : hal-01319767 , version 1
DOI : 10.1016/j.patrec.2014.05.020

Citer

Mohamed Morchid, Richard Dufour, Pierre-Michel Bousquet, Georges Linares, Juan-Manuel Torres-Moreno. Feature selection using Principal Component Analysis for massive retweet detection. Pattern Recognition Letters, 2014, ⟨10.1016/j.patrec.2014.05.020⟩. ⟨hal-01319767⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-AVIGNON LIA

192 Consultations

0 Téléchargements