HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Fast SPARQL join processing between distributed streams and stored RDF graphs using bloom filters

Abstract : The growth of real-time data generation and stored data leads us to be constantly in thinking about the three V's big data challenges: volume, velocity and variety. Existing RDF Stream Processing (RSP) systems have solved the variety lock by defining a common model for producing, transmitting and continuously querying data in RDF model. On the volume and velocity side, the performances of RSP systems need to be improved particularly in terms of joins process between stored and streaming RDF graphs. Stored RDF data are very important in streaming context (related ontologies, summarized RDF data, non-evolutive RDF data or evolve very slowly over time, etc.) but existing RSP systems such as C-SPARQL, CQELS, SPARQL stream , EP-SPARQL, Sparkwave, etc. use non-optimized and non-scalable approaches for performing join operations between stored and dynamic RDF data. Indeed, these systems need to read the entire local or remote stored RDF data sets while RDF data streams continuously arrived and need to be processed in near real-time. This latency may negatively affect performances in terms of continuous processing and often causes multiple bottlenecks within the network in a distributed environment. That also makes impractical to refresh data or update the stored contents. This paper proposes an approach for distributed real-time joins between stored and streaming RDF graphs using Bloom filters. The join procedure consists of adding fast processing by greatly reducing intermediate results, in-memory indices storage and precomputing query partitions according to the picked SPARQL query variable(s) between the two natures of RDF data. Experimental and evaluations results confirm the performances gained with our approach which significantly speeds up the query processing compared to the actual RSP's techniques.
Complete list of metadata

Contributor : Elisabeth Métais Connect in order to contact the contributor
Submitted on : Wednesday, February 5, 2020 - 10:13:46 PM
Last modification on : Monday, February 21, 2022 - 3:38:18 PM




Amadou Fall Dia, Zakia Kazi Aoul, Aliou Boly, Elisabeth Metais. Fast SPARQL join processing between distributed streams and stored RDF graphs using bloom filters. 2018 12th International Conference on Research Challenges in Information Science (RCIS), May 2018, Nantes, France. pp.1-12, ⟨10.1109/RCIS.2018.8406674⟩. ⟨hal-02468577⟩



Record views