T. Hey, S. Tansley, and K. M. Tolle, The Fourth Paradigm ??? Data-Intensive Scientific Discovery, 2009.
DOI : 10.1007/978-3-642-33299-9_1

J. Dean and S. Ghemawat, MapReduce, Communications of the ACM, vol.51, issue.1, pp.107-113, 2008.
DOI : 10.1145/1327452.1327492

M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma et al., Resilient Distributed Datasets, NSDI'12: The 9th USENIX Symposium on Networked Systems Design and Implementation, pp.15-28
DOI : 10.1145/2886107.2886110

B. Nicolae, C. Costa, C. Misale, K. Katrinis, and Y. Park, Towards Memory-Optimized Data Shuffling Patterns for Big Data Analytics, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp.409-412, 2016.
DOI : 10.1109/CCGrid.2016.85

URL : https://hal.archives-ouvertes.fr/hal-01355227

G. Graefe, Encapsulation of parallelism in the volcano query processing system, " in SIGMOD '90: The, ACM SIGMOD International Conference on Management of Data, pp.102-111, 1990.

C. Baru and G. Fecteau, An overview of DB2 parallel edition, ACM SIGMOD Record, vol.24, issue.2, pp.460-462, 1995.
DOI : 10.1145/568271.223876

B. Nicolae, Understanding Vertical Scalability of I/O Virtualization for MapReduce Workloads: Challenges and Opportunities, BigDataCloud '13: 2nd Workshop on Big Data Management in Clouds (held in conjunction with EuroPar'13), 2013.
DOI : 10.1007/978-3-642-54420-0_1

URL : https://hal.archives-ouvertes.fr/hal-00856877

Z. Ren, X. Xu, J. Wan, W. Shi, and M. Zhou, Workload characterization on a production Hadoop cluster: A case study on Taobao, 2012 IEEE International Symposium on Workload Characterization (IISWC), pp.3-13, 2012.
DOI : 10.1109/IISWC.2012.6402895

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

Z. Guo, G. Fox, and M. Zhou, Investigation of data locality and fairness in mapreduce, " in MapReduce '12: The Third International Workshop on MapReduce and Its Applications, pp.25-32, 2012.

J. Tan, A. Chin, Z. Z. Hu, Y. Hu, S. Meng et al., DynMR, Proceedings of the Ninth European Conference on Computer Systems, EuroSys '14, pp.1-2, 2014.
DOI : 10.1145/2592798.2592805

J. Zhang, H. Zhou, R. Chen, X. Fan, Z. Guo et al., Optimizing Data Shuffling in Dataparallel Computation by Understanding User-defined Functions, NSDI'12: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, pp.1-2214

K. Ousterhout, R. Rasti, S. Ratnasamy, S. Shenker, and B. Chun, Making sense of performance in data analytics frameworks, NSDI'15: The 12th USENIX Conference on Networked Systems Design and Implementation, pp.293-307, 2015.

M. Li, L. Zeng, S. Meng, J. Tan, L. Zhang et al., MRONLINE, Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, HPDC '14, pp.165-176, 2014.
DOI : 10.1145/2600212.2600229

D. Cheng, J. Rao, Y. Guo, and X. Zhou, Improving MapReduce performance in heterogeneous environments with adaptive task tuning, Proceedings of the 15th International Middleware Conference on, Middleware '14, pp.97-108, 2014.
DOI : 10.1145/2663165.2666089

C. L. Abad, N. Roberts, Y. Lu, and R. H. Campbell, A storagecentric analysis of mapreduce workloads: File popularity, temporal locality and arrival patterns, IISWC '12 Proceedings of the 2012 IEEE International Symposium on Workload Characterization, pp.100-109

C. L. Abad, H. Luu, N. Roberts, K. Lee, Y. Lu et al., Metadata Traces and Workload Models for Evaluating Big Storage Systems, 2012 IEEE Fifth International Conference on Utility and Cloud Computing, pp.125-132, 2012.
DOI : 10.1109/UCC.2012.27

B. Nicolae, G. Antoniu, L. Bougé, D. Moise, and A. Carpen-amarie, BlobSeer: Next-generation data management for large scale infrastructures, Journal of Parallel and Distributed Computing, vol.71, issue.2, pp.169-184, 2011.
DOI : 10.1016/j.jpdc.2010.08.004

URL : https://hal.archives-ouvertes.fr/inria-00511414

H. Li, A. Ghodsi, M. Zaharia, S. Shenker, and I. Stoica, Tachyon, Proceedings of the ACM Symposium on Cloud Computing, SOCC '14, pp.1-6
DOI : 10.1145/2670979.2670985

N. S. Islam, X. Lu, M. W. Rahman, and D. K. Panda, Sorhdfs: A seda-based approach to maximize overlapping in rdmaenhanced hdfs, HPDC '14: The 23rd International Symposium on High-performance Parallel and Distributed Computing, pp.261-264, 2014.

F. J. Clemente-castelí-o, B. Nicolae, K. Katrinis, M. M. Rafique, R. Mayo et al., Enabling Big Data Analytics in the Hybrid Cloud Using Iterative MapReduce, UCC'15: 8th IEEE/ACM International Conference on Utility and Cloud Computing, pp.290-299, 2015.

B. Nicolae, P. Riteau, and K. Keahey, Towards Transparent Throughput Elasticity for IaaS Cloud Storage:, International Journal of Distributed Systems and Technologies, vol.6, issue.4, pp.21-44, 2015.
DOI : 10.4018/IJDST.2015100102

URL : https://hal.archives-ouvertes.fr/hal-01199464

G. Greiner and R. Jacob, The Efficiency of MapReduce in Parallel External Memory, LATIN'12: Proceedings of the 10th Latin American International Conference on Theoretical Informatics, pp.433-445, 2012.
DOI : 10.1007/978-3-642-29344-3_37

S. Seo, I. Jang, K. Woo, I. Kim, J. Kim et al., HPMR: Prefetching and pre-shuffling in shared MapReduce computation environment, 2009 IEEE International Conference on Cluster Computing and Workshops, pp.1-8, 2009.
DOI : 10.1109/CLUSTR.2009.5289171

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

Y. Li, I. Pandis, R. M. ¨-uller, V. Raman, and G. M. Lohman, Numaaware algorithms: the case of data shuffling, CIDR '13: The 6th Biennial Conference on Innovative Data Systems Research, 2013.

M. W. Rahman, X. Lu, N. S. Islam, and D. K. Panda, HOMR, Proceedings of the 28th ACM international conference on Supercomputing, ICS '14, pp.33-42, 2014.
DOI : 10.1145/2597652.2597684

X. Lu, M. W. Rahman, N. Islam, D. Shankar, and D. K. Panda, Accelerating Spark with RDMA for Big Data Processing: Early Experiences, 2014 IEEE 22nd Annual Symposium on High-Performance Interconnects, pp.9-16, 2014.
DOI : 10.1109/HOTI.2014.15

M. Chowdhury, M. Zaharia, J. Ma, M. I. Jordan, and I. Stoica, Managing data transfers in computer clusters with orchestra, ACM SIGCOMM Computer Communication Review, vol.41, issue.4, pp.98-109, 2011.
DOI : 10.1145/2043164.2018448

A. Davidson and A. Or, Optimizing shuffle performance in spark, 2013.

B. Nicolae, On the benefits of transparent compression for costeffective cloud data storage Transactions on Large-Scale Data-and Knowledge-Centered Systems, pp.167-184, 2011.