Skip to Main content Skip to Navigation
Conference papers

SPARQL query processing with Apache Spark

Abstract : The number and the size of linked open data graphs keep growing at a fast pace and confronts semantic RDF services with problems characterized as Big data. Distributed query processing is one of them and needs to be eciently ad- dressed with execution guaranteeing scalability, high avail- ability and fault tolerance. RDF data management sys- tems requiring these properties are rarely built from scratch but are rather designed on top of an existing engine. In this work, we consider the processing of SPARQL queries with the current state of the art cluster computing engine, namely Apache Spark. We propose and compare ve dif- ferent query processing approaches based on di erent join execution models and Spark components. A detailed exper- imentation on real-world and synthetic data sets promotes two new approaches tailored for the RDF data model which outperform (by a factor of up to 2.4 on query execution time compared to a state of the art distributed SPARQL process- ing engine) the other ones on all major query shapes, i.e., star, snow ake, chain and their composition.
Document type :
Conference papers
Complete list of metadata
Contributor : Bernd Amann Connect in order to contact the contributor
Submitted on : Thursday, January 26, 2017 - 6:06:56 PM
Last modification on : Saturday, January 15, 2022 - 3:58:31 AM

Links full text


  • HAL Id : hal-01447387, version 1
  • ARXIV : 1604.08903


Hubert Naacke, Olivier Curé, Bernd Amann. SPARQL query processing with Apache Spark. Journées Bases de Données Avancées (BDA 2016), Nov 2016, Poitiers, France. pp.24-25. ⟨hal-01447387⟩



Record views