Skip to Main content Skip to Navigation
Conference papers

SPARQL Graph Pattern Processing with Apache Spark

Abstract : A common way to achieve scalability for processing SPARQL queries over large RDF data sets is to choose map-reduce frameworks like Hadoop or Spark. Processing complex SPARQL queries generating large join plans over distributed data partitions is a major challenge in these shared nothing architectures. In this article we are particularly interested in two representative distributed join algorithms, partitioned join and broadcast join, which are deployed in map-reduce frameworks for the evaluation of complex distributed graph pattern join plans. We compare five SPARQL graph pattern evaluation implementations on top of Apache Spark to illustrate the importance of cautiously choosing the physical data storage layer and of the possibility to use both join algorithms to take account of the existing predefined data partitionings. Our experimentations with different SPARQL benchmarks over real-world and synthetic workloads emphasize that hybrid join plans introduce more flexibility and often can achieve better performance than join plans using a single kind of join implementation.
Document type :
Conference papers
Complete list of metadata
Contributor : Bernd Amann Connect in order to contact the contributor
Submitted on : Wednesday, April 5, 2017 - 4:17:10 PM
Last modification on : Friday, September 16, 2022 - 1:56:06 PM


  • HAL Id : hal-01502519, version 1


Hubert Naacke, Bernd Amann, Olivier Curé. SPARQL Graph Pattern Processing with Apache Spark. GRADES (Graph Data-management Experiences & Systems), Workshop, SIGMOD 2017, May 2017, Chicago, United States. pp.1-7. ⟨hal-01502519⟩



Record views