Skip to Main content Skip to Navigation
Conference papers

SPARQL query processing with Apache Spark

Abstract : The number and the size of linked open data graphs keep growing at a fast pace and confronts semantic RDF services with problems characterized as Big data. Distributed query processing is one of them and needs to be eciently ad- dressed with execution guaranteeing scalability, high avail- ability and fault tolerance. RDF data management sys- tems requiring these properties are rarely built from scratch but are rather designed on top of an existing engine. In this work, we consider the processing of SPARQL queries with the current state of the art cluster computing engine, namely Apache Spark. We propose and compare ve dif- ferent query processing approaches based on di erent join execution models and Spark components. A detailed exper- imentation on real-world and synthetic data sets promotes two new approaches tailored for the RDF data model which outperform (by a factor of up to 2.4 on query execution time compared to a state of the art distributed SPARQL process- ing engine) the other ones on all major query shapes, i.e., star, snow ake, chain and their composition.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01447387
Contributor : Bernd Amann <>
Submitted on : Thursday, January 26, 2017 - 6:06:56 PM
Last modification on : Wednesday, February 26, 2020 - 7:06:07 PM

Links full text

Identifiers

  • HAL Id : hal-01447387, version 1
  • ARXIV : 1604.08903

Citation

Hubert Naacke, Olivier Curé, Bernd Amann. SPARQL query processing with Apache Spark. Journées Bases de Données Avancées (BDA 2016), Nov 2016, Poitiers, France. pp.24-25. ⟨hal-01447387⟩

Share

Metrics

Record views

2027