Parallel Processing of Group-By Join Queries on Shared Nothing Machines

Mohamad Al Hajj Hassan; Mostafa Bamha

doi:10.1007/978-3-540-70621-2_19

Book Sections Year : 2008

Parallel Processing of Group-By Join Queries on Shared Nothing Machines

(1) , (1)

Mohamad Al Hajj Hassan

Function : Author
PersonId : 835266

Laboratoire d'Informatique Fondamentale d'Orléans

Mostafa Bamha

Function : Author
PersonId : 834021

Laboratoire d'Informatique Fondamentale d'Orléans

Abstract

SQL queries involving join and group-by operations are frequently used in many decision support applications. In these applications, the size of the input relations is usually very large, so the parallelization of these queries is highly recommended in order to obtain a desirable response time. The main drawbacks of the presented parallel algorithms that treat this kind of queries are that they are very sensitive to data skew and involve expansive communication and Input/Output costs in the evaluation of the join operation. In this paper, we present an algorithm that minimizes the communication cost by performing the group-by operation before redistribution where only tuples that will be present in the join result are redistributed. In addition, it evaluates the query without the need of materializing the result of the join operation and thus reducing the Input/Output cost of join intermediate results. The performance of this algorithm is analyzed using the scalable and portable BSP (Bulk Synchronous Parallel) cost model which predicts a near-linear speed-up even for highly skewed data.

Keywords

PDBMS Parallel joins Data skew Join product skew GroupBy-Join queries BSP cost model

Domains

Computer science

Mostafa Bamha : Connect in order to contact the contributor

https://hal.science/hal-00460664

Submitted on : Monday, March 1, 2010-9:02:40 PM

Last modification on : Saturday, June 25, 2022-10:11:01 AM

Dates and versions

hal-00460664 , version 1 (01-03-2010)

Identifiers

HAL Id : hal-00460664 , version 1
DOI : 10.1007/978-3-540-70621-2_19

Cite

Mohamad Al Hajj Hassan, Mostafa Bamha. Parallel Processing of Group-By Join Queries on Shared Nothing Machines. Joaquim Filipe, Boris Shishkov and Markus Helfert. Software and Data Technologies, Extended and revised -ICSOFT'2006 Best papers- Book, Springer Berlin Heidelberg, pp.230-241, 2008, Communications in Computer and Information Science;, ⟨10.1007/978-3-540-70621-2_19⟩. ⟨hal-00460664⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-ORLEANS MSL MSL-THESE

83 View

0 Download

Parallel Processing of Group-By Join Queries on Shared Nothing Machines

Abstract

Keywords

Domains

Dates and versions

Identifiers

Cite

Export

Collections

Altmetric

Share