X. Wu, X. Zhu, G. Wu, and W. Ding, Data mining with big data, IEEE transactions on knowledge and data engineering, vol.26, issue.1, pp.97-107, 2014.

H. Li, A. Ghodsi, M. Zaharia, S. Shenker, and I. Stoica, Tachyon: Reliable, memory speed storage for cluster computing frameworks, Proceedings of the ACM Symposium on Cloud Computing, pp.1-15, 2014.

A. Moniruzzaman and S. A. Hossain, NoSQL database: New era of databases for big data analytics-classification, characteristics and comparison, 2013.

A. S. Rawat, D. S. Papailiopoulos, A. G. Dimakis, and S. Vishwanath, Locality and availability in distributed storage, IEEE Transactions on Information Theory, vol.62, issue.8, pp.4481-4493, 2016.

D. R. Cutting and D. R. Karger, Scatter/Gather: A cluster-based approach to browsing large document collections, ACM SIGIR Forum, vol.51, pp.148-159, 2017.

G. Juve, A. Chervenak, E. Deelman, S. Bharathi, G. Mehta et al., Characterizing and profiling scientific workflows, Future Generation Computer Systems, vol.29, issue.3, pp.682-692, 2013.

, NERSC storage trends and summaries, 2017.

G. H. Bryan and J. M. Fritsch, A benchmark simulation for moist nonhydrostatic numerical models, Monthly Weather Review, vol.130, issue.12, pp.2917-2928, 2002.

S. Habib, V. Morozov, and H. Finkel, The universe at extreme scale: multi-petaflop sky simulation on the BG/Q, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p.4, 2012.

G. Berriman, J. Good, and D. Curkendall, Montage: An on-demand image mosaic service for the nvo, Astronomical Data Analysis Software and Systems XII, vol.295, p.343, 2003.

R. Graves, T. H. Jordan, S. Callaghan, and E. Deelman, CyberShake: A physics-based seismic hazard model for southern california, Pure and Applied Geophysics, vol.168, issue.3-4, pp.367-381, 2011.

A. Abramovici and W. E. Althouse, LIGO: The laser interferometer gravitational-wave observatory, pp.325-333, 1992.

M. Dorier, G. Antoniu, F. Cappello, M. Snir, and L. Orf, Damaris: How to efficiently leverage multicore parallelism to achieve scalable, jitterfree I/O, Cluster Computing (CLUSTER), 2012 IEEE International Conference on, pp.155-163, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00715252

L. Pineda-morales, A. Costan, and G. Antoniu, Towards multi-site metadata management for geographically distributed cloud workflows, Cluster Computing (CLUSTER), 2015 IEEE International Conference on, pp.294-303, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01239150

B. Dong and Q. Zheng, An optimized approach for storing and accessing small files on cloud storage, Journal of Network and Computer Applications, vol.35, issue.6, pp.1847-1862, 2012.

P. Carns, S. Lang, and R. Ross, Small-file access in parallel file systems, Parallel & Distributed Processing, pp.1-11, 2009.

T. White, The small files problem, Cloudera Blog, 2009.

Y. Zhang and D. Liu, Improving the efficiency of storing for small files in HDFS, Computer Science & Service System (CSSS), 2012 International Conference on, pp.2239-2242, 2012.

G. Mackey, Improving metadata management for small files in HDFS, Cluster Computing and Workshops, pp.1-4, 2009.

D. Karger, E. Lehman, T. Leighton, R. Panigrahy, M. Levine et al., Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web, Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, pp.654-663, 1997.

H. Lamehamedi, Z. Shentu, B. Szymanski, and E. Deelman, Simulation of dynamic data replication strategies in data grids, Parallel and Distributed Processing Symposium, p.10, 2003.

I. Stoica and R. Morris, Chord: A scalable peer-to-peer lookup service for internet applications, ACM SIGCOMM Computer Communication Review, vol.31, issue.4, pp.149-160, 2001.

G. Decandia, D. Hastorun, M. Jampani, and G. Kakulapati, Dynamo: Amazon's highly available key-value store, ACM SIGOPS operating systems review, vol.41, issue.6, pp.205-220, 2007.

P. Matri, A. Costan, G. Antoniu, J. Montes, and M. S. Pérez, Towards efficient location and placement of dynamic replicas for geo-distributed data stores, Proceedings of the ACM 7th Workshop on Scientific Cloud Computing, pp.3-9, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01304328

P. Matri, M. S. Pérez, A. Costan, L. Bougé, and G. Antoniu, Keeping up with storage: decentralized, write-enabled dynamic geo-replication, Future Generation Computer Systems, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01617658

D. Jayalakshmi, T. R. Ranjana, and S. Ramaswamy, Dynamic data replication across geo-distributed cloud data centres, International Conference on Distributed Computing and Internet Technology, pp.182-187, 2016.

P. Matri, A. Costan, G. Antoniu, J. Montes, and M. S. Pérez, T´yrT´yr: blob storage meets built-in transactions, High Performance Computing, Networking, Storage and Analysis, SC16: International Conference for, pp.573-584, 2016.

P. Schwan, Lustre: Building a file system for 1000-node clusters, Proceedings of the 2003 Linux symposium, pp.380-386, 2003.

M. Moore and D. Bonnie, FAST poster session, OrangeFS: Advancing PVFS, 2011.

, Grid'5000-Rennes Hardware (Paravance)," Accessed on, 2017.

R. B. Ross and R. Thakur, PVFS: A parallel file system for linux clusters, Proceedings of the 4th annual Linux showcase and conference, pp.391-430, 2000.

K. Shvachko, H. Kuang, S. Radia, and R. Chansler, The Hadoop distributed file system, Mass storage systems and technologies (MSST), 2010 IEEE 26th symposium on, pp.1-10, 2010.

, Hadoop Documentation-Archives, 2017.

V. G. Korat and K. S. Pamu, Reduction of data at namenode in HDFS using harballing technique, International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), vol.1, issue.4, p.635, 2012.

C. Vorapongkitipun and N. Nupairoj, Improving performance of smallfile accessing in Hadoop, Computer Science and Software Engineering (JCSSE), pp.200-205, 2014.

M. Folk, A. Cheng, and K. Yates, HDF5: A file format and I/O library for high performance computing applications, Proceedings of Supercomputing, vol.99, pp.5-33, 1999.

J. Laitala, Metadata management in distributed file systems, 2017.

R. Klophaus, Riak core: Building distributed applications without shared state, ACM SIGPLAN Commercial Users of Functional Programming, p.14, 2010.

S. A. Weil, S. A. Brandt, E. L. Miller, D. D. Long, and C. Maltzahn, Ceph: A scalable, high-performance distributed file system, Proceedings of the 7th symposium on Operating systems design and implementation. USENIX Association, pp.307-320, 2006.

P. H. Lensing, T. Cortes, and A. Brinkmann, Direct lookup and hashbased metadata placement for local file systems, Proceedings of the 6th International Systems and Storage Conference, p.5, 2013.

A. Metwally, D. Agrawal, and A. E. Abbadi, Efficient computation of frequent and top-k elements in data streams, International Conference on Database Theory, pp.398-412, 2005.

M. Zaharia and M. Chowdhury, Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing, Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, pp.2-2, 2012.

M. Li, J. Tan, Y. Wang, L. Zhang, and V. Salapura, SparkBench: a comprehensive benchmarking suite for in memory data analytic platform spark, Proceedings of the 12th ACM International Conference on Computing Frontiers, p.53, 2015.

E. Anderson, Capture, conversion, and analysis of an intense NFS workload, FAST, vol.9, pp.139-152, 2009.