Distributed SPARQL Query Processing: a Case Study with Apache Spark - Archive ouverte HAL Accéder directement au contenu
Chapitre D'ouvrage Année : 2018

Distributed SPARQL Query Processing: a Case Study with Apache Spark

Bernd Amann
Olivier Curé
Hubert Naacke

Résumé

This chapter focuses on to the problem of evaluating SPARQL queries over large resource description framework (RDF) datasets. RDF data graphs can be produced without a predefined schema and SPARQL allows querying both schema and instance information simultaneously. The chapter presents the challenges and solutions for efficiently processing SPARQL queries and in particular basic graph pattern (BGP) expressions. The main challenge in processing complex graph pattern queries is to optimize the join operations which dominate the cost of all other operators. The chapter introduces the specific solution using the MapReduce framework for processing SPARQL graph patterns. It describes the use of Apache Spark and explains the importance of the physical data layers for the query performance. Spark SQL translates a SQL query into an algebraic expression composed of DF operators such as selection, projection and join.

Mots clés

Fichier non déposé

Dates et versions

hal-02078524 , version 1 (25-03-2019)

Licence

Paternité - Pas de modifications

Identifiants

Citer

Bernd Amann, Olivier Curé, Hubert Naacke. Distributed SPARQL Query Processing: a Case Study with Apache Spark. NoSQL Data Models: Trends and Challenges, 1, Wiley, 2018, 9781119528227. ⟨10.1002/9781119528227.ch2⟩. ⟨hal-02078524⟩
151 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More