Semi-join Computation on Distributed File Systems Using Map-Reduce-Merge Model - Archive ouverte HAL Access content directly
Conference Papers Year : 2010

Semi-join Computation on Distributed File Systems Using Map-Reduce-Merge Model

Abstract

Semi-join is the most used technique to optimize the treatment of complex relational queries on distributed architectures. However, the overhead related to semi-join computation can be very high due to data skew and to the high cost of communication in distributed architectures. Internet search engines needs to process vast amounts of raw data every day. Hence, systems that manage such data should assure scalability, reliability and availability issues with reasonable query processing time. Hadoop and Google's File System are examples of such systems. In this paper, we present a new algorithm based on Map-Reduce-Merge model and distributed histograms for processing semi-join operations on such systems. A cost analysis of this algorithm shows that our approach is insensitive to data skew while reducing communication and disk Input/Output costs to a minimum.
No file

Dates and versions

hal-00460665 , version 1 (01-03-2010)

Identifiers

  • HAL Id : hal-00460665 , version 1

Cite

Mohamad Al Hajj Hassan, Mostafa Bamha. Semi-join Computation on Distributed File Systems Using Map-Reduce-Merge Model. (SAC'2010), Mar 2010, Sierre, Switzerland. pp.406-413. ⟨hal-00460665⟩
74 View
0 Download

Share

Gmail Facebook X LinkedIn More