Distributed SPARQL Query Processing: a Case Study with Apache Spark

Abstract : This chapter focuses on to the problem of evaluating SPARQL queries over large resource description framework (RDF) datasets. RDF data graphs can be produced without a predefined schema and SPARQL allows querying both schema and instance information simultaneously. The chapter presents the challenges and solutions for efficiently processing SPARQL queries and in particular basic graph pattern (BGP) expressions. The main challenge in processing complex graph pattern queries is to optimize the join operations which dominate the cost of all other operators. The chapter introduces the specific solution using the MapReduce framework for processing SPARQL graph patterns. It describes the use of Apache Spark and explains the importance of the physical data layers for the query performance. Spark SQL translates a SQL query into an algebraic expression composed of DF operators such as selection, projection and join.
Document type :
Book sections
Complete list of metadatas

Contributor : Bernd Amann <>
Submitted on : Monday, March 25, 2019 - 2:00:18 PM
Last modification on : Wednesday, March 27, 2019 - 1:34:25 AM


Distributed under a Creative Commons Attribution - NoDerivatives 4.0 International License


  • HAL Id : hal-02078524, version 1

Données associées


Bernd Amann, Olivier Curé, Hubert Naacke. Distributed SPARQL Query Processing: a Case Study with Apache Spark. NoSQL Data Models: Trends and Challenges, 1, Wiley, 2018, 9781119528227. ⟨hal-02078524⟩



Record views