Skip to Main content Skip to Navigation
Book sections

Distributed SPARQL Query Processing: a Case Study with Apache Spark

Abstract : This chapter focuses on to the problem of evaluating SPARQL queries over large resource description framework (RDF) datasets. RDF data graphs can be produced without a predefined schema and SPARQL allows querying both schema and instance information simultaneously. The chapter presents the challenges and solutions for efficiently processing SPARQL queries and in particular basic graph pattern (BGP) expressions. The main challenge in processing complex graph pattern queries is to optimize the join operations which dominate the cost of all other operators. The chapter introduces the specific solution using the MapReduce framework for processing SPARQL graph patterns. It describes the use of Apache Spark and explains the importance of the physical data layers for the query performance. Spark SQL translates a SQL query into an algebraic expression composed of DF operators such as selection, projection and join.
Document type :
Book sections
Complete list of metadata
Contributor : Bernd Amann Connect in order to contact the contributor
Submitted on : Monday, March 25, 2019 - 2:00:18 PM
Last modification on : Saturday, January 15, 2022 - 4:02:20 AM


Distributed under a Creative Commons Attribution - NoDerivatives 4.0 International License


  • HAL Id : hal-02078524, version 1


Bernd Amann, Olivier Curé, Hubert Naacke. Distributed SPARQL Query Processing: a Case Study with Apache Spark. NoSQL Data Models: Trends and Challenges, 1, Wiley, 2018, 9781119528227. ⟨hal-02078524⟩



Record views