Exact and heuristical data workflow placement algorithms for big data computing in cloud datacenters - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Scalable Computing : Practice and Experience Année : 2018

Exact and heuristical data workflow placement algorithms for big data computing in cloud datacenters

Résumé

Several big data-driven applications are currently carried out in collaboration using distributed infrastructure. These data-driven applications usually deal with experiments at massive scale. Data generated by such experiments are huge and stored at multiple geographic locations for reuse. Workflow systems, composed of jobs using collaborative task-based models, present new dependency and data exchange needs. This gives rise to new issues when selecting distributed data and storage resources so that the execution of applications is on time, and resource usage-cost-efficient. In this paper, we present an efficient data placement approach to improve the performance of workflow processing in distributed data centres. The proposed approach involves two types of data: splittable and unsplittable intermediate data. Moreover, we place intermediate data by considering not only their source location but also their dependencies. The main objective is to minimise the total storage cost, including the effort for transferring, storing, and moving that data according to the applications needs. We first propose an exact algorithm which takes into account the intra-job dependencies, and we show that the optimal fractional intermediate data placement problem is NP-hard. To solve the problem of unsplittable intermediate data placement, we propose a greedy heuristic algorithm based on a network flow optimisation framework. The experimental results show that the performance of our approach is very promising. We also show that even with divergent conditions, the cost ratio of the heuristic approach is close to the optimal solution

Dates et versions

hal-01997451 , version 1 (29-01-2019)

Identifiants

Citer

Sonia Ikken, Eric Renault, Abdelkamel Tari, M. Tahar Kechadi. Exact and heuristical data workflow placement algorithms for big data computing in cloud datacenters. Scalable Computing : Practice and Experience, 2018, 19 (3), pp.223 - 244. ⟨10.12694/scpe.v19i3.1365⟩. ⟨hal-01997451⟩
26 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More