SPARQL query processing with Apache Spark - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2016

SPARQL query processing with Apache Spark

Hubert Naacke
Olivier Curé
Bernd Amann

Résumé

The number and the size of linked open data graphs keep growing at a fast pace and confronts semantic RDF services with problems characterized as Big data. Distributed query processing is one of them and needs to be eciently ad- dressed with execution guaranteeing scalability, high avail- ability and fault tolerance. RDF data management sys- tems requiring these properties are rarely built from scratch but are rather designed on top of an existing engine. In this work, we consider the processing of SPARQL queries with the current state of the art cluster computing engine, namely Apache Spark. We propose and compare ve dif- ferent query processing approaches based on di erent join execution models and Spark components. A detailed exper- imentation on real-world and synthetic data sets promotes two new approaches tailored for the RDF data model which outperform (by a factor of up to 2.4 on query execution time compared to a state of the art distributed SPARQL process- ing engine) the other ones on all major query shapes, i.e., star, snow ake, chain and their composition.

Dates et versions

hal-01447387 , version 1 (26-01-2017)

Identifiants

Citer

Hubert Naacke, Olivier Curé, Bernd Amann. SPARQL query processing with Apache Spark. Journées Bases de Données Avancées (BDA 2016), Nov 2016, Poitiers, France. pp.24-25. ⟨hal-01447387⟩
1952 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More