An Author-Topic based Approach to Cluster Tweets and Mine their Location

Abstract : Social Networks became a major actor in information propagation. Using the Twitter popular platform, mobile users post or relay messages from different locations. The tweet content, meaning and location show how an event-such as the bursty one “JeSuisCharlie'” happened in France in January 2015 is comprehended in different countries. This research aims at clustering the tweets according to the co-occurrence of their terms, including the country, and forecasting the probable country of a non located tweet, knowing its content. First, we present the process of collecting a large quantity of data from the Twitter website. We finally have a set of 2.189 located tweets about “Charlie'', from the 7th to the 14th of January. We describe an original method adapted from the Author-Topic (AT) model based on the Latent Dirichlet Allocation method (LDA). We define a homogeneous space containing both lexical content (words) and spatial information (country). During a training process on a part of the sample, we provide a set of clusters (topics) based on statistical relations between lexical and spatial terms. During a clustering task, we evaluate the method effectiveness on the rest of the sample that reaches up to 95% of good assignment.
Type de document :
Article dans une revue
Procedia Environmental Sciences, Elsevier, 2015, 27, pp.26-29. 〈http://www.sciencedirect.com/science/journal/18780296/27〉. 〈10.1016/j.proenv.2015.07.109〉
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01251313
Contributeur : Didier Josselin <>
Soumis le : mardi 5 janvier 2016 - 23:48:57
Dernière modification le : lundi 25 septembre 2017 - 09:47:04

Identifiants

Collections

Relations

Citation

Mohamed Morchid, Yonathan Portilla, Didier Josselin, Richard Dufour, Eitan Altman, et al.. An Author-Topic based Approach to Cluster Tweets and Mine their Location. Procedia Environmental Sciences, Elsevier, 2015, 27, pp.26-29. 〈http://www.sciencedirect.com/science/journal/18780296/27〉. 〈10.1016/j.proenv.2015.07.109〉. 〈hal-01251313〉

Partager

Métriques

Consultations de la notice

170