A Theoretical and Experimental Comparison of Filter-Based Equijoins in MapReduce

Abstract : MapReduce has become an increasingly popular framework for large-scale data processing. However, complex operations such as joins are quite expensive and require sophisticated techniques. In this paper, we review state-of-the-art strategies for joining several relations in a MapReduce environment and study their extension with filter-based approaches. The general objective of filters is to eliminate non-matching data as early as possible in order to reduce the I/O, communication and CPU costs. We examine the impact of systematically adding filters as early as possible in MapReduce join algorithms, both analytically with cost models and practically with evaluations. The study covers binary joins, multi-way joins and recursive joins, and addresses the case of large inputs that gives rise to the most intricate challenges.
Document type :
Book sections
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01408492
Contributor : Laurent d'Orazio <>
Submitted on : Friday, December 9, 2016 - 3:25:19 PM
Last modification on : Wednesday, January 15, 2020 - 11:13:11 AM
Long-term archiving on: Tuesday, March 21, 2017 - 1:09:42 PM

File

2016tldks.pdf
Files produced by the author(s)

Identifiers

Citation

Thuong-Cang Phan, Laurent d'Orazio, Philippe Rigaux. A Theoretical and Experimental Comparison of Filter-Based Equijoins in MapReduce. Transactions on Large-Scale Data- and Knowledge-Centered Systems XXV, 9620, Springer, pp.33-70, 2016, Lecture Notes in Computer Science, 978-3-662-49533-9. ⟨10.1007/978-3-662-49534-6_2⟩. ⟨hal-01408492⟩

Share

Metrics

Record views

93

Files downloads

304