SPARQL query processing with Apache Spark

Hubert Naacke; Olivier Curé; Bernd Amann

Communication Dans Un Congrès Année : 2016

SPARQL query processing with Apache Spark

(1) , (2) , (1)

1
2

Hubert Naacke

Fonction : Auteur
PersonId : 9627
IdHAL : hubert-naacke
ORCID : 0000-0003-0559-9908
IdRef : 06104203X

Bases de Données

Olivier Curé

Fonction : Auteur
PersonId : 18350
IdHAL : olivier-cure
IdRef : 153626011

Laboratoire d'Informatique Gaspard-Monge

Bernd Amann

Fonction : Auteur
PersonId : 3057
IdHAL : bernd-amann
ORCID : 0000-0002-6822-4049
IdRef : 060259418

Bases de Données

Résumé

The number and the size of linked open data graphs keep growing at a fast pace and confronts semantic RDF services with problems characterized as Big data. Distributed query processing is one of them and needs to be eciently ad- dressed with execution guaranteeing scalability, high avail- ability and fault tolerance. RDF data management sys- tems requiring these properties are rarely built from scratch but are rather designed on top of an existing engine. In this work, we consider the processing of SPARQL queries with the current state of the art cluster computing engine, namely Apache Spark. We propose and compare ve dif- ferent query processing approaches based on di erent join execution models and Spark components. A detailed exper- imentation on real-world and synthetic data sets promotes two new approaches tailored for the RDF data model which outperform (by a factor of up to 2.4 on query execution time compared to a state of the art distributed SPARQL process- ing engine) the other ones on all major query shapes, i.e., star, snow ake, chain and their composition.

Mots clés

rdf sparql spark distributed join algorithms query optimisation

Domaines

Base de données [cs.DB]

Bernd Amann : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01447387

Soumis le : jeudi 26 janvier 2017-18:06:56

Dernière modification le : jeudi 28 mars 2024-03:28:50

Dates et versions

hal-01447387 , version 1 (26-01-2017)

Identifiants

HAL Id : hal-01447387 , version 1
ARXIV : 1604.08903

Citer

Hubert Naacke, Olivier Curé, Bernd Amann. SPARQL query processing with Apache Spark. Journées Bases de Données Avancées (BDA 2016), Nov 2016, Poitiers, France. pp.24-25. ⟨hal-01447387⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENPC UPMC CNRS PARISTECH LIGM LIGM_MOA LIP6 ESIEE-PARIS SORBONNE-UNIVERSITE SU-SCIENCES UNIV-EIFFEL LIGM_BAAM JSE2024

1952 Consultations

0 Téléchargements

SPARQL query processing with Apache Spark

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager