Distributed query processing over fluctuating streams

Roland Kotto Kombi 1, 2
2 BD - Base de Données
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
Abstract : In a Big Data context, stream processing has become a very active research domain. In order to manage ephemeral data (Velocity) arriving at important rates (Volume), some specific solutions, denoted data stream management systems (DSMSs),have been developed. DSMSs take as inputs some queries, called continuous queries,defined on a set of data streams. Acontinuous query generates new results as long as new data arrive in input. In many application domains, data streams haveinput rates and distribution of values which change over time. These variations may impact significantly processingrequirements for each continuous query.This thesis takes place in the ANR project Socioplug (ANR-13-INFR-0003). In this context, we consider a collaborative platformfor stream processing. Each user can submit multiple continuous queries and contributes to the execution support of theplatform. However, as each processing unit supporting treatments has limited resources in terms of CPU and memory, asignificant increase in input rate may cause the congestion of the system. The problem is then how to adjust dynamicallyresource usage to processing requirements for each continuous query ? It raises several challenges : i) how to detect a need ofreconfiguration ? ii) when reconfiguring the system to avoid its congestion at runtime ?In this work, we are interested by the different processing steps involved in the treatment of a continuous query over adistributed infrastructure. From this global analysis, we extract mechanisms enabling dynamic adaptation of resource usage foreach continuous query. We focus on automatic parallelization, or auto-parallelization, of operators composing the executionplan of a continuous query. We suggest an original approach based on the monitoring of operators and an estimation ofprocessing requirements in near future. Thus, we can increase (scale-out), or decrease (scale-in) the parallelism degree ofoperators in a proactive many such as resource usage fits to processing requirements dynamically. Compared to a staticconfiguration defined by an expert, we show that it is possible to avoid the congestion of the system in many cases or to delay itin most critical cases. Moreover, we show that resource usage can be reduced significantly while delivering equivalentthroughput and result quality. We suggest also to combine this approach with complementary mechanisms for dynamic adaptation of continuous queries at runtime. These differents approaches have been implemented within a widely used DSMS and have been tested over multiple and reproductible micro-benchmarks.
Document type :
Theses
Complete list of metadatas

https://hal.archives-ouvertes.fr/tel-01932556
Contributor : Abes Star <>
Submitted on : Thursday, February 7, 2019 - 10:07:35 AM
Last modification on : Friday, May 17, 2019 - 10:32:57 AM
Long-term archiving on : Wednesday, May 8, 2019 - 1:43:09 PM

File

these.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01932556, version 2

Citation

Roland Kotto Kombi. Distributed query processing over fluctuating streams. Other [cs.OH]. Université de Lyon, 2018. English. ⟨NNT : 2018LYSEI050⟩. ⟨tel-01932556v2⟩

Share

Metrics

Record views

83

Files downloads

44