Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Declarative parallel query processing on large scale astronomical databases

Amin Mesmoudi 1
1 BD - Base de Données
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
Abstract : This work is carried out in framework of the PetaSky project. The objective of this project is to provide a set of tools allowing to manage Peta-bytes of data from astronomical observations. Our work is concerned with the design of a scalable approach. We first started by analyzing the ability of MapReduce based systems and supporting SQL to manage the LSST data and ensure optimization capabilities for certain types of queries. We analyzed the impact of data partitioning, indexing and compression on query performance. From our experiments, it follows that there is no “magic” technique to partition, store and index data but the efficiency of dedicated techniques depends mainly on the type of queries and the typology of data that are considered. Based on our work on benchmarking, we identified some techniques to be integrated to large-scale data management systems. We designed a new system allowing to support multiple partitioning mechanisms and several evaluation operators. We used the BSP (Bulk Synchronous Parallel) model as a parallel computation paradigm. Unlike MapeReduce model, we send intermediate results to workers that can continue their processing. Data is logically represented as a graph. The evaluation of queries is performed by exploring the data graph using forward and backward edges. We also offer a semi-automatic partitioning approach, i.e., we provide the system administrator with a set of tools allowing her/him to choose the manner of partitioning data using the schema of the database and domain knowledge. The first experiments show that our approach provides a significant performance improvement with respect to Map/Reduce systems
Document type :
Preprints, Working Papers, ...
Complete list of metadata
Contributor : Équipe Gestionnaire Des Publications Si Liris Connect in order to contact the contributor
Submitted on : Monday, April 24, 2017 - 10:19:38 AM
Last modification on : Tuesday, June 1, 2021 - 2:08:08 PM


  • HAL Id : hal-01512622, version 1


Amin Mesmoudi. Declarative parallel query processing on large scale astronomical databases. 2015. ⟨hal-01512622⟩



Record views