Enriching Geolocalized Dataset with POIs Descriptions at Large Scale - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

Enriching Geolocalized Dataset with POIs Descriptions at Large Scale

Hubert Naacke

Résumé

We present an efficient method to enrich a geolocalized dataset with contextual description about Points of Interest (POI). We implemented our solution using two large scale datasets: YFCC and Geonames. A practical problem we have encountered is the size of the manipulated data. Actually, the YFCC geolocalized dataset accounts for 45 million entries that we propose to cross with 12 millions of Geonames POIs. We show that using the Apache Spark cluster computing platform and the GeoSpark spatial join library as-is lead to inefficient computation because of the important bias in the data. We propose a method to distribute the data non uniformly according to the data bias, which greatly improves the spatial join performance. Moreover, we propose a method to select among a set of close POIs, those which are the most relevant with the YFCC entries. The resulting enriched dataset will be made publicly available and should contribute to better validate future works on large scale POI recommendation.
Fichier non déposé

Dates et versions

hal-03990440 , version 1 (15-02-2023)

Identifiants

Citer

Ibrahima Gueye, Hubert Naacke, Stéphane Gançarski. Enriching Geolocalized Dataset with POIs Descriptions at Large Scale. Innovations and Interdisciplinary Solutions for Underserved Areas, Mar 2020, Nairobi, Kenya. pp.264-273, ⟨10.1007/978-3-030-51051-0_19⟩. ⟨hal-03990440⟩
14 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More