A Comparison of Systems to Large-Scale Data Access.

Amin Mesmoudi 1 Mohand-Said Hacid 1
1 BD - Base de Données
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
Abstract : With the amount of data produced in several application domains, it is increasingly difficult to manage and query related large data repositories.1 Within the PetaSky project, we focus on the problem of managing scientific data in the field of cosmology. The data we consider are those of the LSST project. The overall expected size of the database that will be produced will exceed 60 PB. This paper presents preliminary results of experiments conducted on PT1.12 and PT1.23 data sets in order to compare the performances of both centralized and distributed database management systems. As for centralized systems, we have deployed three different DBMSs: Mysql, Postgresql and DBMS-X (a commercial relational database). Regarding distributed systems, we have deployed HadoopDB and Hive. The goal of these experiments is to report on the ability of these systems to support large scale declarative queries. We mainly investigate the impact of data partitioning, indexing and compression on query execution performances.
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01313176
Contributor : Équipe Gestionnaire Des Publications Si Liris <>
Submitted on : Monday, May 9, 2016 - 4:11:35 PM
Last modification on : Tuesday, February 26, 2019 - 11:49:56 AM

Identifiers

Citation

Amin Mesmoudi, Mohand-Said Hacid. A Comparison of Systems to Large-Scale Data Access.. Database Systems for Advanced Applications - 19th International Conference, DASFAA 2014, International Workshop: BDMA, Revised Selected Papers., Apr 2014, Bali, Indonesia. pp. 161-175, ⟨10.1007/978-3-662-43984-5_12⟩. ⟨hal-01313176⟩

Share

Metrics

Record views

136