Distributed SPARQL Query Processing: a Case Study with Apache Spark

Bernd Amann; Olivier Curé; Hubert Naacke

doi:10.1002/9781119528227.ch2

Chapitre D'ouvrage Année : 2018

Distributed SPARQL Query Processing: a Case Study with Apache Spark

(1) , (2) , (1)

1
2

Bernd Amann

Fonction : Auteur
PersonId : 3057
IdHAL : bernd-amann
ORCID : 0000-0002-6822-4049
IdRef : 060259418

Bases de Données

Olivier Curé

Fonction : Auteur
PersonId : 18350
IdHAL : olivier-cure
IdRef : 153626011

Laboratoire d'Informatique Gaspard-Monge

Hubert Naacke

Fonction : Auteur
PersonId : 9627
IdHAL : hubert-naacke
ORCID : 0000-0003-0559-9908
IdRef : 06104203X

Bases de Données

Résumé

This chapter focuses on to the problem of evaluating SPARQL queries over large resource description framework (RDF) datasets. RDF data graphs can be produced without a predefined schema and SPARQL allows querying both schema and instance information simultaneously. The chapter presents the challenges and solutions for efficiently processing SPARQL queries and in particular basic graph pattern (BGP) expressions. The main challenge in processing complex graph pattern queries is to optimize the join operations which dominate the cost of all other operators. The chapter introduces the specific solution using the MapReduce framework for processing SPARQL graph patterns. It describes the use of Apache Spark and explains the importance of the physical data layers for the query performance. Spark SQL translates a SQL query into an algebraic expression composed of DF operators such as selection, projection and join.

Mots clés

SPARQL Spark Optimisation

Domaines

Base de données [cs.DB]

Bernd Amann : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02078524

Soumis le : lundi 25 mars 2019-14:00:18

Dernière modification le : vendredi 26 avril 2024-13:51:31

Dates et versions

hal-02078524 , version 1 (25-03-2019)

Licence

Paternité - Pas de modifications

Identifiants

HAL Id : hal-02078524 , version 1
DOI : 10.1002/9781119528227.ch2

Citer

Bernd Amann, Olivier Curé, Hubert Naacke. Distributed SPARQL Query Processing: a Case Study with Apache Spark. NoSQL Data Models: Trends and Challenges, 1, Wiley, 2018, 9781119528227. ⟨10.1002/9781119528227.ch2⟩. ⟨hal-02078524⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENPC CNRS INSMI PARISTECH LIGM LIP6 SORBONNE-UNIVERSITE SU-SCIENCES UNIV-EIFFEL LIGM_BAAM JSE2024

160 Consultations

0 Téléchargements

Distributed SPARQL Query Processing: a Case Study with Apache Spark

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager