SPARQL Graph Pattern Processing with Apache Spark

Hubert Naacke; Bernd Amann; Olivier Curé

Communication Dans Un Congrès Année : 2017

SPARQL Graph Pattern Processing with Apache Spark

(1) , (1) , (2)

1
2

Hubert Naacke

Fonction : Auteur
PersonId : 9627
IdHAL : hubert-naacke
ORCID : 0000-0003-0559-9908
IdRef : 06104203X

Bases de Données

Bernd Amann

Fonction : Auteur
PersonId : 3057
IdHAL : bernd-amann
ORCID : 0000-0002-6822-4049
IdRef : 060259418

Bases de Données

Olivier Curé

Fonction : Auteur
PersonId : 18350
IdHAL : olivier-cure
IdRef : 153626011

Laboratoire d'Informatique Gaspard-Monge

Résumé

A common way to achieve scalability for processing SPARQL queries over large RDF data sets is to choose map-reduce frameworks like Hadoop or Spark. Processing complex SPARQL queries generating large join plans over distributed data partitions is a major challenge in these shared nothing architectures. In this article we are particularly interested in two representative distributed join algorithms, partitioned join and broadcast join, which are deployed in map-reduce frameworks for the evaluation of complex distributed graph pattern join plans. We compare five SPARQL graph pattern evaluation implementations on top of Apache Spark to illustrate the importance of cautiously choosing the physical data storage layer and of the possibility to use both join algorithms to take account of the existing predefined data partitionings. Our experimentations with different SPARQL benchmarks over real-world and synthetic workloads emphasize that hybrid join plans introduce more flexibility and often can achieve better performance than join plans using a single kind of join implementation.

Domaines

Base de données [cs.DB]

Bernd Amann : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01502519

Soumis le : mercredi 5 avril 2017-16:17:10

Dernière modification le : samedi 27 avril 2024-03:15:49

Dates et versions

hal-01502519 , version 1 (05-04-2017)

Identifiants

HAL Id : hal-01502519 , version 1

Citer

Hubert Naacke, Bernd Amann, Olivier Curé. SPARQL Graph Pattern Processing with Apache Spark. GRADES (Graph Data-management Experiences & Systems), Workshop, SIGMOD 2017, May 2017, Chicago, United States. pp.1-7. ⟨hal-01502519⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENPC UPMC CNRS INSMI PARISTECH LIGM LIGM_MOA LIP6 ESIEE-PARIS SORBONNE-UNIVERSITE SU-SCIENCES UNIV-EIFFEL LIGM_BAAM JSE2024

493 Consultations

0 Téléchargements

SPARQL Graph Pattern Processing with Apache Spark

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager