Fast SPARQL join processing between distributed streams and stored RDF graphs using bloom filters

Abstract : The growth of real-time data generation and stored data leads us to be constantly in thinking about the three V's big data challenges: volume, velocity and variety. Existing RDF Stream Processing (RSP) systems have solved the variety lock by defining a common model for producing, transmitting and continuously querying data in RDF model. On the volume and velocity side, the performances of RSP systems need to be improved particularly in terms of joins process between stored and streaming RDF graphs. Stored RDF data are very important in streaming context (related ontologies, summarized RDF data, non-evolutive RDF data or evolve very slowly over time, etc.) but existing RSP systems such as C-SPARQL, CQELS, SPARQL stream , EP-SPARQL, Sparkwave, etc. use non-optimized and non-scalable approaches for performing join operations between stored and dynamic RDF data. Indeed, these systems need to read the entire local or remote stored RDF data sets while RDF data streams continuously arrived and need to be processed in near real-time. This latency may negatively affect performances in terms of continuous processing and often causes multiple bottlenecks within the network in a distributed environment. That also makes impractical to refresh data or update the stored contents. This paper proposes an approach for distributed real-time joins between stored and streaming RDF graphs using Bloom filters. The join procedure consists of adding fast processing by greatly reducing intermediate results, in-memory indices storage and precomputing query partitions according to the picked SPARQL query variable(s) between the two natures of RDF data. Experimental and evaluations results confirm the performances gained with our approach which significantly speeds up the query processing compared to the actual RSP's techniques.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-02468577
Contributor : Elisabeth Métais <>
Submitted on : Wednesday, February 5, 2020 - 10:13:46 PM
Last modification on : Saturday, February 8, 2020 - 1:27:36 AM

Identifiers

Collections

Citation

Amadou Fall Dia, Zakia Kazi Aoul, Aliou Boly, Elisabeth Metais. Fast SPARQL join processing between distributed streams and stored RDF graphs using bloom filters. 2018 12th International Conference on Research Challenges in Information Science (RCIS), May 2018, Nantes, France. pp.1-12, ⟨10.1109/RCIS.2018.8406674⟩. ⟨hal-02468577⟩

Share

Metrics

Record views

12