HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Journal articles

Performance Model of MapReduce Iterative Applications for Hybrid Cloud Bursting

Abstract : Hybrid cloud bursting (i.e., leasing temporary off-premise cloud resources to boost the overall capacity during peak utilization) can be a cost-effective way to deal with the increasing complexity of big data analytics, especially for iterative applications. However, the low through-put, high latency network link between the on-premise and off-premise resources ("weak link") makes maintaining scalability difficult. While several data locality techniques have been designed for big data bursting on hybrid clouds, their effectiveness is difficult to estimate in advance. Yet such estimations are critical, because they help users decide whether the extra pay-as-you-go cost incurred by using the off-premise resources justifies the runtime speed-up. To this end, the current paper presents a performance model and methodology to estimate the runtime of iterative MapReduce applications in a hybrid cloud-bursting scenario. The paper focuses on the overhead incurred by the weak link at fine granularity, for both the map and the reduce phases. This approach enables high estimation accuracy, as demonstrated by extensive experiments at scale using a mix of real-world iterative MapReduce applications from standard big data benchmarking suites that cover a broad spectrum of data patterns. Not only are the produced estimations accurate in absolute terms compared with experimental results, but they are also up to an order of magnitude more accurate than applying state-of-art estimation approaches originally designed for single-site MapReduce deployments.
Complete list of metadata

Cited literature [41 references]  Display  Hide  Download

Contributor : Bogdan Nicolae Connect in order to contact the contributor
Submitted on : Wednesday, January 30, 2019 - 4:47:16 AM
Last modification on : Thursday, February 7, 2019 - 4:32:50 PM


Files produced by the author(s)


  • HAL Id : hal-01999033, version 1


Francisco Clemente-Castello, Bogdan Nicolae, Rafael Mayo, Juan Carlos Fernandez. Performance Model of MapReduce Iterative Applications for Hybrid Cloud Bursting. IEEE Transactions on Parallel and Distributed Systems, Institute of Electrical and Electronics Engineers, 2018, 29 (8), pp.1794-1807. ⟨hal-01999033⟩



Record views


Files downloads