Efficient Graph-Oriented Summary for Optimized Resource Description Framework Streams Processing Using Extended Centrality Measures

Abstract : Existing RDF Stream Processing (RSP) systems allow continuous processing of RDF data issued from different application domains such as weather station measuring phenomena, geolocation, IoT applications, drinking water distribution management and so on. However processing window phase often expires before finishing the entire session and RSP systems immediately delete data streams after each processed window. Such mechanism does not allow an optimized exploitation of the RDF data streams as the most relevant and pertinent information of the data is often not used in a due time and almost impossible to be exploited for further analyzes. It should be better to keep the most informative part of data within streams while minimizing the memory storage space. In this work, we propose an RDF graph summarization system based on an explicit and implicit expressed needs through three (3) main approaches: (1) an approach for user queries (SPARQL) in order to extract their needs and group them into a more global query, (2) an extension of the closeness centrality measure issued from Social Network Analysis (SNA) to determine the most informative parts of the graph and (3) an RDF graph summarization technique combining extracted user query needs and the extended centrality measure. Experiments and evaluations show efficient result in term of memory space storage and the most expected approximate query results on summarized graphs compared to the source ones. Existing RDF Stream Processing (RSP) systems allow continuous processing of RDF data issued from different application domains such as weather station measuring phenomena, geolocation, IoT applications, drinking water distribution management and so on. However processing window phase often expires before finishing the entire session and RSP systems immediately delete data streams after each processed window. Such mechanism does not allow an optimized exploitation of the RDF data streams as the most relevant and pertinent information of the data is often not used in a due time and almost impossible to be exploited for further analyzes. It should be better to keep the most informative part of data within streams while minimizing the memory storage space. In this work, we propose an RDF graph summarization system based on an explicit and implicit expressed needs through three (3) main approaches: (1) an approach for user queries (SPARQL) in order to extract their needs and group them into a more global query, (2) an extension of the closeness centrality measure issued from Social Network Analysis (SNA) to determine the most informative parts of the graph and (3) an RDF graph summarization technique combining extracted user query needs and the extended centrality measure. Experiments and evaluations show efficient result in term of memory space storage and the most expected approximate query results on summarized graphs compared to the source ones.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-02468605
Contributor : Elisabeth Métais <>
Submitted on : Thursday, February 6, 2020 - 12:17:30 AM
Last modification on : Saturday, February 8, 2020 - 1:27:38 AM

Identifiers

  • HAL Id : hal-02468605, version 1

Collections

Citation

Amadou Fall Dia, M. Ulbricht, Aliou Boly, Zakia Kasi-Aoul, Elisabeth Metais. Efficient Graph-Oriented Summary for Optimized Resource Description Framework Streams Processing Using Extended Centrality Measures. ICDM 2018 : 20th International Conference on Data Mining, Jul 2018, Istamboul, Turkey. pp 1430-1441. ⟨hal-02468605⟩

Share

Metrics

Record views

14