. Cisco, White paper: Cisco vni forecast and methodology, 2016.

T. Guo, U. Sharma, T. Wood, S. Sahu, and P. Shenoy, Seagull: Intelligent cloud bursting for enterprise applications, USENIX ATC '12: Conference on Annual Technical Conference, pp.33-33, 2012.

F. J. Clemente-castelí-o, B. Nicolae, R. Mayo, J. C. Fernández, and M. M. Rafique, On exploiting data locality for iterative mapreduce applications in hybrid clouds, BDCAT '16: 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, pp.118-122, 2016.

F. J. Clemente-castelí-o, B. Nicolae, K. Katrinis, M. M. Rafique, R. Mayo et al., Enabling big data analytics in the hybrid cloud using iterative MapReduce, UCC '15: 8th IEEE/ACM International Conference on Utility and Cloud Computing, pp.290-299, 2015.

T. White, Hadoop: The Definitive Guide. USA, 2010.

F. J. Clemente-castello, B. Nicolae, M. M. Rafique, R. Mayo, and J. C. Fernandez, Evaluation of data locality strategies for hybrid cloud bursting of iterative mapreduce, CCGrid'17: 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp.181-185, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01469991

T. Gunarathne, T. Wu, J. Qiu, and G. Fox, MapReduce in the clouds for science, CloudCom '10: 2on IEEE Conference on Cloud Computing Technology and Science, pp.565-572, 2010.

X. Zhang, L. T. Yang, C. Liu, and J. Chen, A scalable two-phase top-down specialization approach for data anonymization using MapReduce on cloud, IEEE Transactions on Parallel and Distributed Systems, vol.25, issue.2, pp.363-373, 2014.

B. Nicolae, P. Riteau, and K. Keahey, Bursting the cloud data bubble: Towards transparent storage elasticity in IaaS clouds, IPDPS '14: 28th IEEE International Parallel and Distributed Processing Symposium, pp.135-144, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00947599

, Transparent Throughput Elasticity for IaaS Cloud Storage Using Guest-Side Block-Level Caching, UCC'14: 7th IEEE/ACM International Conference on Utility and Cloud Computing, 2014.

B. Nicolae, C. Costa, C. Misale, K. Katrinis, and Y. Park, Leveraging adaptive I/O to optimize collective data shuffling patterns for big data analytics, IEEE Transactions on Parallel and Distributed Systems, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01531374

Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst, HaLoop: Efficient iterative data processing on large clusters, Proc. VLDB Endow, vol.3, issue.1-2, pp.285-296, 2010.

Y. Zhang, Q. Gao, L. Gao, and C. Wang, iMapReduce: A distributed computing framework for iterative computation, Journal of Grid Computing, vol.10, issue.1, pp.47-68, 2012.

B. Nicolae, P. Riteau, and K. Keahey, Towards transparent throughput elasticity for IaaS cloud storage: Exploring the benefits of adaptive block-level caching, International Journal of Distributed Systems and Technologies, vol.6, issue.4, pp.21-44, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01199464

F. Tian and K. Chen, Towards optimal resource provisioning for running MapReduce programs in public clouds, CLOUD '11: IEEE International Conference on Cloud Computing, pp.155-162, 2011.

K. Chen, J. Powers, S. Guo, and F. Tian, CRESP: Towards optimal resource provisioning for MapReduce computing in public clouds, IEEE Transactions on Parallel and Distributed Systems, vol.25, issue.6, pp.1403-1412, 2014.

P. Lama and X. Zhou, AROMA: Automated resource allocation and configuration of MapReduce environment in the cloud, ICAC '12: 9th International Conference on Autonomic Computing, pp.63-72, 2012.

H. Herodotou, H. Lim, G. Luo, N. Borisov, L. Dong et al., Starfish: A self-tuning system for big data analytics, CRID '11: 5th Biennial Conference on Innovative Data Systems Research, pp.261-272, 2011.

A. Verma, L. Cherkasova, and R. H. Campbell, ARIA: Automatic Resource Inference and Allocation for Mapreduce Environments, ICAC '11: 8th ACM International Conference on Autonomic Computing, 2011.

A. Verma, R. H. Cherkasova, and L. Campbell, Resource provisioning framework for MapReduce jobs with performance goals, Middleware '11: 12th ACM/IFIP/USENIX International Middleware Conference, pp.165-186, 2011.
URL : https://hal.archives-ouvertes.fr/hal-01597764

H. Herodotou, Hadoop Performance Models, CS-2011-05, 2011.

Z. Zhang, L. Cherkasova, and B. T. Loo, Benchmarking approach for designing a mapreduce performance model, ICPE '13: 4th ACM/SPEC International Conference on Performance Engineering, pp.253-258, 2013.

, Performance modeling of mapreduce jobs in heterogeneous cloud environments, CLOUD '13: 6th IEEE International Conference on Cloud Computing, pp.839-846, 2013.

M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica, Improving mapreduce performance in heterogeneous environments, OSDI '08: 8th USENIX Conference on Operating Systems Design and Implementation, pp.29-42, 2008.

F. Ahmad, S. T. Chakradhar, A. Raghunathan, and T. N. , Tarazu: Optimizing mapreduce on heterogeneous clusters, ASPLOS '12: 17th International Conference on Architectural Support for Programming Languages and Operating Systems, pp.61-74, 2012.

J. Polo, D. Carrera, Y. Becerra, V. Beltran, J. Torres et al., Performance management of accelerated MapReduce workloads in heterogeneous clusters, 2010.

K. Shvachko, H. Huang, S. Radia, and R. Chansler, The Hadoop distributed file system, MSST '10: 26th IEEE Symposium on Massive Storage Systems and Technologies, 2010.

M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma et al., Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing, NSDI'12: 9th USENIX Conference on Networked Systems Design and Implementation, vol.2, pp.1-2, 2012.

G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn et al., Pregel: A system for large-scale graph processing, SIGMOD'10: The 2010 ACM SIGMOD International Conference on Management of Data, pp.135-146, 2010.

. Apache, Apache hadoop rumen, pp.13-15

S. Godard, Sysstat utilities for the Linux OS, pp.13-15

A. Alexandrov, R. Bergmann, S. Ewen, J. Freytag, F. Hueske et al.,

K. Tzoumas and D. Warneke, The stratosphere platform for big data analytics, VLDB J, vol.23, issue.6, pp.939-964, 2014.

S. Ahn and S. Park, An analytical approach to evaluation of ssd effects under mapreduce workloads, Journal of Semiconductor Technology and Science, vol.15, pp.511-518, 2015.

S. H. Mohamed, T. E. El-gorashi, and J. M. Elmirghani, On the energy efficiency of mapreduce shuffling operations in data centers, ICTON'17: 19th International Conference on Transparent Optical Networks, pp.1-5, 2017.

H. Bock, Clustering methods: A history of K-Means algorithms, Selected Contributions in Data Analysis and Classification, pp.161-172, 2007.

W. Zhao, H. Ma, and Q. He, Parallel K-Means clustering based on MapReduce, CloudCom '09: 1st International Conference on Cloud Computing, 2009.

S. Huang, J. Huang, J. Dai, T. Xie, and B. Huang, The HiBench benchmark suite: Characterization of the MapReduce-based data analysis, ICDEW '10: 26th IEEE International Conference on Data Engineering Workshops, pp.41-51, 2010.

S. Brin and L. Page, The anatomy of a large-scale hypertextual web search engine, Comput. Netw. ISDN Syst, vol.30, issue.1-7, pp.107-117, 1998.

T. Seidl, B. Boden, and S. Fries, CC-MR-Finding Connected Components in Huge Graphs with MapReduce, pp.458-473, 2012.

L. Wang, J. Zhan, C. Luo, Y. Zhu, Q. Yang et al., BigDataBench: A big data benchmark suite from internet services
DOI : 10.1109/hpca.2014.6835958
URL : http://arxiv.org/pdf/1401.1406.pdf