TwitCID: a Collection of Data Sets for Studies on Information Diffusion on Social Networks - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2019

TwitCID: a Collection of Data Sets for Studies on Information Diffusion on Social Networks

Résumé

Online social networks play a crucial role in spreading information at a very large scale. Modeling information propagation on social networks has been attracting a lot of attention from researchers. However, none of the data sets used in past works are made available to the research community, while they would be very useful for comparative studies. In this paper, we detail a collection of tweets composed of five data sets for a total of 18 million tweets that we release, and which is designed to evaluate methods on modeling the information spread, in the case of general information and brands marketing information. In addition to tweet IDs and a script to retrieve the whole tweet in JSON from the Twitter API, we release the values of the 29 extracted features for these data sets. These features consist of user based, content based and temporal based features. Finally, we provide the results of information diffusion prediction models (80% accuracy) which could serve as strong baselines for this research topic.
Fichier principal
Vignette du fichier
Hoang_26251.pdf (886.42 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02930104 , version 1 (04-09-2020)

Identifiants

Citer

Thi Bich Ngoc Hoang, Josiane Mothe, Manon Baillon. TwitCID: a Collection of Data Sets for Studies on Information Diffusion on Social Networks. Conference and Labs of the Evaluation Forum (CLEF 2019), Sep 2019, Lugano, Switzerland. pp.88-100, ⟨10.1007/978-3-030-28577-7_5⟩. ⟨hal-02930104⟩
78 Consultations
117 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More