Skip to Main content Skip to Navigation
Journal articles

Location extraction from tweets

Abstract : Five hundred million tweets are posted daily, making Twitter a major social media platform from which topical information on events can be extracted. These events are represented by three main dimensions: time, location and entity-related information. The focus of this paper is location, which is an essential dimension for geo-spatial applications, either when helping rescue operations during a disaster or when used for contextual recommendations. While the first type of application needs high recall, the second is more precision-oriented. This paper studies the recall/precision trade-off, combining different methods to extract locations. In the context of short posts, applying tools that have been developed for natural language is not sufficient given the nature of tweets which are generally too short to be linguistically correct. Also bearing in mind the high number of posts that need to be handled, we hypothesize that predicting whether a post contains a location or not could make the location extractors more focused and thus more effective. We introduce a model to predict whether a tweet contains a location or not and show that location prediction is a useful pre-processing step for location extraction. We define a number of new tweet features and we conduct an intensive evaluation. Our findings are that (1) combining existing location extraction tools is effective for precision-oriented or recall-oriented results, (2) enriching tweet representation is effective for predicting whether a tweet contains a location or not, (3) words appearing in a geography gazetteer and the occurrence of a preposition just before a proper noun are the two most important features for predicting the occurrence of a location in tweets, and (4) the accuracy of location extraction improves when it is possible to predict that there is a location in a tweet.
Complete list of metadatas

Cited literature [35 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02640811
Contributor : Open Archive Toulouse Archive Ouverte (oatao) <>
Submitted on : Thursday, May 28, 2020 - 3:22:13 PM
Last modification on : Tuesday, September 8, 2020 - 10:42:05 AM

File

Hoang_22120.pdf
Files produced by the author(s)

Identifiers

Citation

Thi Bich Ngoc Hoang, Josiane Mothe. Location extraction from tweets. Information Processing and Management, Elsevier, 2018, 54 (2), pp.129-144. ⟨10.1016/j.ipm.2017.11.001⟩. ⟨hal-02640811⟩

Share

Metrics

Record views

24

Files downloads

67