, Strategies for multi-site metadata management

C. Metadata-;-baseline and ). .. ,

R. Metadata, on Each Site)

N. Decentralized and . .. Metadata,

, Matching strategies to processing patterns

. .. Metadata, One step further: managing workflow hot

R. .. Implementation, 88 7.3.2 Separate handling of hot and cold metadata, vol.87

B. .. Hpc and B. .. , 131 10.1.1 Comparative overview of the HPC and BDA stacks, 1.3 Challenges of storage convergence between HPC, p.133

. .. , 134 10.2.2 Storage call distribution for HPC and BDA applications, Blobs as a storage model for convergence

. .. Týr, 140 10.3.1 Týr as a storage backend for HPC applications

B. Ig, . Fast, . Applications, . Expected, and . Move, This fuels a recent trend towards the convergence of HPC and Big Data, which is currently greatly influencing the two worlds. Both communities have diverged significantly over the past in terms of proposed solutions and research orientation. As a result, HPC and BDA stacks remain mostly separated today. Interestingly however, their challenges at the data management layer are similar: trading versatility for performance, TOWARDS more compute intensive algorithms to get deeper insights for descriptive, predictive and prescriptive analytics

A. Costan, R. Tudoran, G. Antoniu, and G. Brasche, TomusBlobs: Scalable Data-intensive Processing on Azure Clouds, vol.28, pp.950-976, 2016.
URL : https://hal.archives-ouvertes.fr/hal-00767034

G. Antoniu, J. Bigot, L. Bougé, F. Briant, F. Cappello et al., Scalable Data Management for Map-Reducebased Data-Intensive Applications: a View for Cloud and Hybrid Infrastructures, International Journal of Cloud Computing, vol.2, issue.2, pp.150-170, 2013.

S. Ene, B. Nicolae, A. Costan, and G. Antoniu, To Overlap or Not to Overlap: Optimizing Incremental MapReduce Computations for On-Demand Data Upload, 5th International Workshop on Data-Intensive Computing in the Clouds (in conjunction with IEEE/ACM SC 2014), pp.9-16, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01094609

I. Legrand, R. Voicu, C. Cirstoiu, C. Grigoras, L. Betev et al., Monitoring and Control of Large Systems with MonALISA, Communications of the ACM, vol.52, pp.49-55, 2009.

A. Ovidiu-cristian-marcu, G. Costan, M. S. Antoniu, and . Pérez, Spark Versus Flink: Understanding Performance in Big Data Analytics Frameworks, IEEE International Conference on Cluster Computing, 2016.

T. Taipei, , pp.433-442, 2016.

A. Ovidiu-cristian-marcu, G. Costan, M. S. Antoniu, B. Pérez, R. Nicolae et al., KerA: Scalable Data Ingestion for Stream Processing, 38th IEEE International Conference on Distributed Computing Systems (ICDCS 2018, pp.1480-1485, 2018.

A. Ovidiu-cristian-marcu, G. Costan, M. S. Antoniu, R. Pérez, S. Tudoran et al., Towards a Unified Storage and Ingestion Architecture for Stream Processing, IEEE International Conference on Big Data, pp.2402-2407, 2017.

R. Ovidiu-cristian-marcu, B. Tudoran, A. Nicolae, G. Costan, M. S. Antoniu et al., Exploring Shared State in Key-Value Store for Window-Based Multi-pattern Streaming Analytics, 1st Workshop on the Integration of Extreme Scale Computing and Big Data Management and Analytics, pp.1044-1052, 2017.

P. Matri, A. Costan, G. Antoniu, J. Montes, and M. S. Pérez, Towards Efficient Location and Placement of Dynamic Replicas for Geo-Distributed Data Stores, 7th Workshop on Scientific Cloud Computing -ScienceCloud (in conjunction with ACM HPDC, pp.3-9, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01304328

P. Matri, A. Costan, G. Antoniu, J. Montes, and M. S. Pérez, Týr: Blob Storage Meets Built-In Transactions, IEEE/ACM: International Conference for High Performance Computing, Networking, Storage and Analysis, pp.573-584, 2016.

P. Matri, M. S. Pérez, A. Costan, and G. Antoniu, TýrFS : Increasing Small Files Access Performance with Dynamic Metadata Replication, 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp.452-461, 2018.

P. Matri, M. S. Pérez, A. Costan, L. Bougé, and G. Antoniu, Keeping Up With Storage: Decentralized, Write-enabled Dynamic Geo-Replication, Future Generation Computer Systems, vol.86, pp.1093-1105, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01617658

R. Benoît-da-mota, A. Tudoran, G. Costan, G. Brasche, B. Antoniu et al., Machine Learning Patterns for Neuroimaging-Genetic Studies in the Cloud, Frontiers in Neuroinformatics, vol.8, pp.1-28, 2014.

L. Pineda-morales, A. Costan, and G. Antoniu, Towards Multi-site Metadata Management for Geographically Distributed Cloud Workflows, IEEE International Conference on Cluster Computing, pp.294-303, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01239150

L. Pineda-morales, J. Liu, A. Costan, E. Pacitti, G. Antoniu et al., Managing Hot Metadata for Scientific Workflows on Multisite Clouds, IEEE International Conference on Big Data, pp.390-397, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01395715

R. Roman, B. Nicolae, A. Costan, and G. Antoniu, Understanding Spark Performance in Hybrid and Multi-Site Clouds, 6th International Workshop on Big Data Analytics: Challenges and Opportunities (in conjunction with IEEE/ACM SC15, pp.10-16, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01239140

R. Tudoran, A. Costan, and G. Antoniu, Big Data Storage and Processing on Azure Clouds: Experiments at Scale and Lessons Learned, Cloud Computing for Data-Intensive Applications, pp.331-355, 2015.

R. Tudoran, A. Costan, and G. Antoniu, DataSteward: Using Dedicated Compute Nodes for Scalable Data Management on Public Clouds, 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TRUSTCOM 2013, pp.1057-1064, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00927283

R. Tudoran, A. Costan, and G. Antoniu, MapIterativeReduce: A Framework for Reduction-Intensive Data Processing on Azure Clouds, 3rd International Workshop on MapReduce and its Applications (in conjunction with ACM HPDC, pp.9-16, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00684814

R. Tudoran, A. Costan, and G. Antoniu, OverFlow: Multi-Site Aware Big Data Management for Scientific Workflows on Clouds, IEEE Transactions on Cloud Computing, vol.4, pp.76-89, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01239128

R. Tudoran, A. Costan, and G. Antoniu, Transfer as a Service: Towards a Cost-Effective Model for Multi-site Cloud Data Management, 33rd IEEE Symposium on Reliable Distributed Systems (SRDS, pp.51-56, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01023282

R. Tudoran, A. Costan, G. Antoniu, and L. Bougé, A Performance Evaluation of Azure and Nimbus Clouds for Scientific Applications, 2nd International Workshop on Cloud Computing Platforms -CloudCP (in conjunction with ACM SIGOPS EuroSys, pp.10-16, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00677842

R. Tudoran, A. Costan, G. Antoniu, and H. Soncu, Tomusblobs: Towards Communication-Efficient Storage for MapReduce Applications in Azure, 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID 2012, pp.427-434, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00670725

R. Tudoran, A. Costan, . Benoît-da, G. Mota, B. Antoniu et al., A-Brain: Using the Cloud to Understand the Impact of Genetic Variability on the Brain, International Workshop on CloudFutures, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00781571

R. Tudoran, A. Costan, O. Nano, I. Santos, H. Soncu et al., JetStream: Enabling High Throughput Live Event Streaming on Multi-site Clouds, Future Generation Computer Systems, vol.54, pp.274-291, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01239124

R. Tudoran, A. Costan, G. Ramin-rezai-rad, G. Brasche, and . Antoniu, Adaptive File Management for Scientific Workflows on the Azure Cloud, IEEE International Conference on Big Data (BIGDATA 2013), pp.273-281, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00926748

R. Tudoran, A. Costan, R. Wang, L. Bougé, and G. Antoniu, Bridging Data in the Clouds: An Environment-Aware System for Geographically Distributed Data Transfers, 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID, pp.92-101, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00978153

R. Tudoran, O. Nano, I. Santos, A. Costan, H. Soncu et al., JetStream: Enabling High Performance Event Streaming Across Cloud Data-Centers, 8th ACM International Conference on Distributed Event-Based Systems, pp.23-34, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01090281

, Other References

K. Aamodt, A. Quintana, A. Acounis, and A. Adler, The ALICE experiment at the CERN LHC, Journal of Instrumentation, vol.3, pp.8-20, 2008.
URL : https://hal.archives-ouvertes.fr/in2p3-00311441

R. Agarwal, G. Juve, and E. Deelman, Peer-to-Peer Data Sharing for Scientific Workflows on Amazon EC2, Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis. SCC '12, pp.82-89, 2012.

B. Agrawal, A. Chakravorty, C. Rong, and T. Wlodarczyk, R2Time: a framework to analyse open TSDB time-series data in HBase, 6th IEEE International Conference on Cloud Computing Technology and Science, pp.970-975, 2014.

D. Agrawal, A. Butt, K. Doshi, -. Josep, M. Larriba-pey et al., SparkBench-a spark performance testing suite, Technology Conference on Performance Evaluation and Benchmarking, pp.26-44, 2015.

T. Akidau, R. Bradshaw, C. Chambers, S. Chernyak, R. J. Fernández-moctezuma et al., The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-scale, Unbounded, Out-of-order Data Processing, Proc. VLDB Endow, vol.8, pp.1792-1803, 2015.

A. Streams,

N. Ali, P. Carns, K. Iskra, D. Kimpe, S. Lang et al., Scalable I/O forwarding framework for high-performance computing systems, Cluster Computing and Workshops, pp.1-10, 2009.

W. Allcock, GridFTP: Protocol Extensions to FTP for the Grid, In: Global Grid ForumGFD-RP, vol.20, 2003.

A. Beulah-kurian, Grid Eigen Trust a Framework for Computing Reputation in Grids, 2003.

S. Amazon, , 2018.

. P. David, G. Anderson, and . Fedak, The Computational and Storage Potential of Volunteer Computing, 6th IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06), vol.1, pp.73-80, 2006.

A. Drill, , 2018.

A. Edgent, , 2018.

A. Hadoop,

A. Hive,

A. Kafka,

A. Kudu, , 2018.

A. Nifi, , 2018.

A. Oozie, , 2018.

A. Pig,

A. Pulsar, , 2018.

A. Spark and S. ,

A. Storm,

A. Tez, , 2018.

M. Armbrust, A. Fox, R. Griffith, D. Anthony, R. Joseph et al., A view of cloud computing, Communications of the ACM, vol.53, pp.50-58, 2010.

M. Asch and T. Moore, Pathways to Convergence, 2017.

A. Council and U. K. , , 2018.

P. Bailis, A. Fekete, J. Michael, A. Franklin, . Ghodsi et al., Coordination avoidance in database systems, Proceedings of the VLDB Endowment, vol.8, pp.185-196, 2014.

D. Balouek, Adding Virtualization Capabilities to the Grid'5000 Testbed, Cloud Computing and Services Science, vol.367, pp.3-20, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00946971

. Bdva and . European, Big Data Value Strategic Research and Innovation Agenda, 2017.

G. Bell, T. Hey, and A. Szalay, Beyond the Data Deluge, Science, vol.323, pp.1297-1298, 2009.

J. Bennett, Combining in-situ and in-transit processing to enable extremescale scientific analysis, SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp.1-9, 2012.

S. Bharathi, A. Chervenak, E. Deelman, G. Mehta, M. Su et al., Characterization of scientific workflows, 3rd Workshop on Workflows in Support of Large-Scale Science, pp.1-10, 2008.

, Big Data and Extreme-scale Computing Workshop

D. Borthakur, HDFS architecture guide, Hadoop Apache Project, vol.53, 2008.

I. Botan, G. Alonso, P. M. Fischer, D. Kossmann, and N. Tatbul, Flexible and Scalable Storage Management for Data-intensive Stream Processing, Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. EDBT '09, pp.934-945, 2009.

K. Budati, J. Sonnek, A. Chandra, and J. Weissman, Ridge: Combining Reliability and Performance in Open Grid Platforms, Proceedings of the 16th International Symposium on High Performance Distributed Computing. HPDC '07, pp.978-979, 2007.

R. Buyya, S. Chee-shin-yeo, J. Venugopal, I. Broberg, and . Brandic, Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility, Future Generation Computer Systems, vol.25, pp.599-616, 2009.

B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold et al., Windows Azure Storage: a highly available cloud storage service with strong consistency, Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pp.143-157, 2011.

B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold et al., Windows Azure Storage: a highly available cloud storage service with strong consistency, Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pp.143-157, 2011.

P. Carbone, S. Ewen, G. Fóra, S. Haridi, S. Richter et al., State Management in Apache Flink: Consistent Stateful Distributed Stream Processing, Proc. VLDB Endow, vol.10, pp.1718-1729, 2017.

P. Carbone, S. Ewen, S. Haridi, A. Katsifodimos, K. Volker-markl et al., Apache Flink: Stream and Batch Processing in a Single Engine, Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol.38, pp.1-11, 2015.

. Josiah-l-carlson, Redis in action, 2013.

P. Carns, K. Harms, D. Kimpe, R. Ross, J. Wozniak et al., A case for optimistic coordination in hpc storage systems, High Performance Computing, Networking, Storage and Analysis (SCC), pp.48-53, 2012.

, Ceph: differences from POSIX

F. Chang, J. Dean, S. Ghemawat, C. Wilson, D. A. Hsieh et al., Bigtable: A distributed storage system for structured data, ACM Transactions on Computer Systems (TOCS), vol.26, pp.4-30, 2008.

K. Chodorow, MongoDB: The Definitive Guide: Powerful and Scalable Data Storage, 2013.

M. Copeland, J. Soh, A. Puca, M. Manning, and D. Gollob, Microsoft azure and cloud computing, Microsoft Azure, pp.3-26, 2015.

A. Das, I. Gupta, and A. Motivala, Swim: Scalable weaklyconsistent infection-style process group membership protocol, Proceedings. International Conference on, pp.303-312, 2002.

J. Dean and S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, Commun. ACM, vol.51, issue.1, pp.107-113, 2008.

J. Dean and S. Ghemawat, MapReduce: simplified data processing on large clusters, Communications of the ACM, vol.51, pp.107-113, 2008.

G. Decandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman et al., Dynamo: Amazon's highly available key-value store, ACM SIGOPS operating systems review 41, vol.6, pp.205-220, 2007.

E. Deelman, G. Singh, M. Livny, B. Berriman, and J. Good, The Cost of Doing Science on the Cloud: The Montage Example, Proceedings of the 2008 ACM/IEEE Conference on Supercomputing. SC '08, pp.501-512, 2008.

E. Deelman, Pegasus: A Framework for Mapping Complex Scientific Workflows Onto Distributed Systems, In: Sci. Program, vol.13, issue.3, pp.219-237, 2005.

, Developing Cloud Applications using the e-Science Central Platform, In: Proceedings of Royal Society A, vol.371, 1983.

R. Escriva, . Emin-gün, and . Sirer, The design and implementation of the warp transactional filesystem, 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16). USENIX Association, 2016.

R. Escriva, B. Wong, and E. Sirer, HyperDex: A distributed, searchable key-value store, Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication, pp.25-36, 2012.

R. Escriva, B. Wong, and E. Sirer, Warp: Lightweight multi-key transactions for key-value stores, 2015.

P. Fan, Z. Chen, J. Wang, Z. Zheng, and M. R. Lyu, TopologyAware Deployment of Scientific Applications in Cloud Computing, 2012.

P. Fan, Z. Chen, J. Wang1, Z. Zheng, and M. R. Lyu, Scientific application deployment on Cloud: A Topology-Aware Method, Concurrency and Computattion: Practice and Experience, 2012.

. Flink-fault-tolerance, , 2018.

I. Foster, A. Chervenak, D. Gunter, K. Keahey, R. Madduri et al., Enabling PETASCALE Data Movement and Analysis, Scidac Review, 2009.

I. Foster, R. Kettimuthu, S. Martin, S. Tuecke, D. Milroy et al., Campus Bridging Made Easy via Globus Services, Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the Campus and Beyond. XSEDE '12, vol.50, pp.1-50, 2012.
DOI : 10.1145/2335755.2335847

G. C. Fox, J. Shantenu, Q. Judy, E. Saliya, and L. Andre, Towards a Comprehensive Set of Big Data Benchmarks, Advances in Parallel Computing, vol.26, pp.47-66, 2015.

G. Fox, J. Qiu, and S. Jha, Saliya Ekanayake, and Supun Kamburugamuve. Big Data, Simulations and HPC Convergence, 2016.

G. Fox, J. Qiu, and S. Jha, Saliya Ekanayake, and Supun Kamburugamuve, Big Data Benchmarking, pp.3-17, 2015.

L. George, HBase: the definitive guide: random access to your planet-size data, 2011.

J. Giardino, J. Haridas, and B. Calder, How to get most out of Windows Azure Tables, 2013.

D. Gibson, Is Your Big Data Hot, Warm, or Cold?, 2012.

J. Ginsberg, M. H. Mohebbi, R. S. Patel, L. Brammer, M. S. Smolinski et al., Detecting influenza epidemics using search engine query data, Nature, vol.457, pp.1012-1014, 2009.
DOI : 10.1038/nature07634

L. B. Gomez and F. Cappello, Improving floating point compression through binary masks, 2013 IEEE International Conference on Big Data, pp.326-331, 2013.
DOI : 10.1109/bigdata.2013.6691591

G. Cloud-platform,

W. Gropp and E. Lusk, Users guide for MPICH, a portable Implementation of MPI, 1996.

S. Guo, R. Dhamankar, and L. Stewart, DistributedLog: A High Performance Replicated Log Service, 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp.1183-1194, 2017.
DOI : 10.1109/icde.2017.163

N. Hayashibara, X. Defago, R. Yared, and T. Katayama, The ? accrual failure detector, Proceedings of the 23rd IEEE International Symposium on, pp.66-78, 2004.
DOI : 10.1109/reldis.2004.1353004

N. Hemsoth, HPC and Big Data: A "Best of Both Worlds, Approach". In: HPC Wire, vol.1, 2014.

, The Fourth Paradigm: DataIntensive Scientific Discovery. Microsoft Research, 2009.

P. Hunt, M. Konar, B. Flavio-paiva-junqueira, and . Reed, ZooKeeper: Wait-free Coordination for Internet-scale Systems, USENIX annual technical conference, vol.8, 2010.

, IDC. IDC's Data Age, 2017.

M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, Dryad: distributed data-parallel programs from sequential building blocks, Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007. EuroSys '07, pp.59-72, 2007.

H. Jin, S. Ibrahim, T. Bell, W. Gao, D. Huang et al., Cloud types and services, Handbook of Cloud Computing, pp.335-355, 2010.

W. Kabsch, A solution for the best rotation to relate two sets of vectors, Acta Crystallographica Section A, vol.32, issue.5, pp.922-923

D. Karger, E. Lehman, T. Leighton, R. Panigrahy, M. Levine et al., Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web, Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, pp.654-663, 1997.

G. Khanna, U. Catalyurek, T. Kurc, R. Kettimuthu, P. Sadayappan et al., Using overlays for efficient data transfer over shared wide-area networks, Proceedings of the 2008 ACM/IEEE conference on Supercomputing. SC '08, vol.47, p.12, 2008.

R. Klophaus, Riak core: building distributed applications without shared state, ACM SIGPLAN Commercial Users of Functional Programming. ACM, p.14, 2010.

T. Kosar, E. Arslan, B. Ross, and B. Zhang, StorkCloud: Data Transfer Scheduling and Optimization As a Service, Proceedings of the 4th ACM Workshop on Scientific Cloud Computing. Science Cloud '13, pp.29-36, 2013.

T. Kosar and M. Livny, A Framework for Reliable and Efficient Data Placement in Distributed Computing Systems, J. Parallel Distrib. Comput, vol.65, pp.1146-1157, 2005.

J. Kreps, N. Narkhede, and J. Rao, Kafka : a Distributed Messaging System for Log Processing, NetDB Conference, 2011.

M. Kuhn, A semantics-aware I/O interface for high performance computing, International Supercomputing Conference, pp.408-421, 2013.

C. Kulkarni, A. Kesavan, R. Ricci, and R. Stutsman, Beyond Simple Request Processing with RAMCloud, IEEE Data Eng. Bull, vol.40, pp.62-69, 2017.

A. Lakshman and P. Malik, Cassandra: a decentralized structured storage system, ACM SIGOPS Operating Systems Review, vol.44, pp.35-40, 2010.

L. Lamport, Paxos made simple, ACM Sigact News, vol.32, pp.18-25, 2001.

N. Leavitt, Will NoSQL databases live up to their promise?, In: Computer, vol.43, issue.2, 2010.

C. Lefevre, The CERN accelerator complex, 2008.

J. Justin, P. Levandoski, R. Larson, and . Stoica, Identifying hot and cold data in main-memory databases, IEEE 29th International Conference on. IEEE. 2013, pp.26-37, 2013.

J. Li, D. Maier, K. Tufte, V. Papadimos, and P. A. Tucker, Semantics and Evaluation Techniques for Window Aggregates in Data Streams, Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. SIGMOD '05, pp.311-322, 2005.

M. Li, J. Tan, Y. Wang, L. Zhang, and V. Salapura, Sparkbench: a comprehensive benchmarking suite for in memory data analytic platform spark, Proceedings of the 12th ACM International Conference on Computing Frontiers, pp.53-63, 2015.

S. Li, H. Lim, W. Victor, J. Lee, A. Ho-ahn et al., Architecting to achieve a billion requests per second throughput on a single key-value store server platform, In: ACM SIGARCH Computer Architecture News, vol.43, issue.3, pp.476-488, 2015.

J. Liu, E. Pacitti, P. Valduriez, and M. Mattoso, Scientific workflow scheduling with provenance support in multisite cloud, 12th International Meeting on High-Performance Computing for Computational Science (VECPAR), pp.1-8, 2016.
URL : https://hal.archives-ouvertes.fr/lirmm-01342190

N. Liu, J. Cope, P. Carns, C. Carothers, R. Ross et al., On the role of burst buffers in leadership-class storage systems, International Symposium on Mass Storage Systems and Technologies, pp.1-11, 2012.

W. Liu, B. Tieman, R. Kettimuthu, and I. Foster, A data transfer framework for large-scale science experiments, Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. HPDC '10, pp.978-979, 2010.

I. Lopez, talks _ convergence _ in _ high _ performance _ data _ analysis, IDC Talks Convergence in High Performance Data Analysis, 2013.

D. Maier, J. Li, P. Tucker, K. Tufte, and V. Papadimos, Semantics of Data Streams and Operators, Proceedings of the 10th International Conference on Database Theory. ICDT'05, pp.37-52, 2005.

M. Malak, Parallel vs. Distributed file systems: Time for RAID on Hadoop? Tech. rep. Data Science Association, 2014.

N. Marz and J. Warren, Big Data: Principles and Best Practices of Scalable Realtime Data Systems, 2015.

J. Meehan, S-Store: Streaming Meets Transaction Processing, Proc. VLDB Endow. 8.13, pp.2134-2145, 2015.

N. Megiddo, S. Dharmendra, and . Modha, ARC: A Self-Tuning, Low Overhead Replacement Cache, In: FAST, vol.3, pp.115-130, 2003.

, Microsoft Azure Managed Cache Service

, Microsoft Azure Service Bus -Cloud Messaging Service

, Microsoft Cloud Services -Deploy web apps and APIs

V. Mishra, Titan graph databases with cassandra, Beginning Apache Cassandra Development, pp.123-151, 2014.

B. Momjian, PostgreSQL: introduction and concepts, vol.192, 2001.

M. Henry, A. R. Monti, S. S. Butt, and . Vazhkudai, CATCH: A CloudBased Adaptive Data Transfer Service for HPC, Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium. IPDPS '11, pp.1242-1253, 2011.

D. Namiot, Time Series Databases, pp.132-137

G. Németh, D. Géhberger, and P. Mátray, DAL: A Locality-optimizing Distributed Shared Memory System, Proceedings of the 9th USENIX Conference on Hot Topics in Cloud Computing. HotCloud'17, pp.12-12, 2017.

B. Nicolae, G. Antoniu, L. Bougé, D. Moise, and A. Carpenamarie, BlobSeer: Next-generation data management for large scale infrastructures, Journal of Parallel and Distributed Computing, vol.71, pp.169-184, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00511414

B. Nicolae, P. Riteau, and K. Keahey, Bursting the Cloud Data Bubble: Towards Transparent Storage Elasticity in IaaS Clouds, 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp.135-144, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00947599

A. Shadi, K. Noghabi, Y. Paramasivam, N. Pan, J. Ramesh et al., Samza: Stateful Scalable Stream Processing at LinkedIn". In: Proc. VLDB Endow, vol.10, pp.1634-1645, 2017.

E. Ogasawara, J. Dias, F. Porto, P. Valduriez, and M. Mattoso, An algebraic approach for data-centric scientific workflows, Proceedings of VLDB Endowment, vol.4, pp.1328-1339, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00640431

D. Ongaro, S. M. Rumble, R. Stutsman, J. Ousterhout, and M. Rosenblum, Fast Crash Recovery in RAMCloud, Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. SOSP '11, pp.29-41, 2011.

J. Ousterhout, The Case for RAMClouds: Scalable High-performance Storage Entirely in DRAM, SIGOPS Oper. Syst. Rev, vol.43, pp.92-105, 2010.

J. Panziera, ETP4HPC Strategic Research Agenda. Tech. rep. ETP4HPC, 2017.

Y. Park, S. Lim, C. Lee, and K. Park, PFFS: a scalable flash memory file system for the hybrid architecture of phase-change RAM and NAND flash, Proceedings of the 2008 ACM symposium on Applied computing, pp.1498-1503, 2008.

A. Parrott and L. Warshaw, , 2018.

A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. Dewitt et al., A Comparison of Approaches to Largescale Data Analysis, Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. SIGMOD '09, pp.165-178, 2009.

. Pravega, , 2018.

C. Raiciu, C. Paasch, S. Barre, A. Ford, M. Honda et al., How Hard Can It Be? Designing and Implementing a Deployable Multipath TCP, Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. NSDI'12, pp.29-29, 2012.

A. Daniel, J. Reed, and . Dongarra, Exascale computing and big data, Communications of the ACM 58, vol.7, pp.56-68, 2015.

P. David and . Reed, Implementing atomic actions on decentralized data, In: ACM Transactions on Computer Systems (TOCS), vol.1, issue.1, pp.3-23, 1983.

R. Robert-b-ross and . Thakur, PVFS: A parallel file system for Linux clusters, Proceedings of the 4th annual Linux showcase and conference, pp.391-430, 2000.

S. Sanfilippo, Redis Cache

P. Schwan, Lustre: Building a file system for 1000-node clusters, Proceedings of the 2003 Linux symposium, pp.380-386, 2003.

P. Schwan, Lustre: Building a file system for 1000-node clusters, Annual Linux Symposium, pp.380-386, 2003.

M. Galen, D. A. Shipman, H. Dillow, F. Sarp-oral, and . Wang, The Spider center wide file system; from concept to reality. Tech. rep. Oak Ridge National Lab, 2009.

K. Shvachko, H. Kuang, S. Radia, and R. Chansler, The hadoop distributed file system, Mass storage systems and technologies (MSST), 2010 IEEE 26th symposium on, pp.1-10, 2010.

. Benoit-sigoure, OpenTSDB: The distributed, scalable time series database, Proc. OSCON, vol.11, 2010.

Y. Simmhan, C. Van-ingen, G. Subramanian, and J. Li, Bridging the Gap between Desktop and the Cloud for eScience Applications, Proceedings of the 2010 IEEE 3rd International Conference on Cloud Computing. CLOUD '10, pp.474-481, 2010.

V. Srinivasan, B. Bulkowski, W. Chu, S. Sayyaparaju, A. Gooding et al., Aerospike: architecture of a real-time operational DBMS, Proceedings of the VLDB Endowment, vol.9, pp.1389-1400, 2016.

I. Stoica, R. Morris, D. Liben-nowell, D. R. Karger, F. Kaashoek et al., Chord: a scalable peer-to-peer lookup protocol for internet applications, IEEE/ACM Transactions on Networking (TON), vol.11, issue.1, pp.17-32, 2003.

M. Szeredi, FUSE: Filesystem in userspace

B. Douglas, . Terry, M. Marvin, K. Theimer, A. J. Petersen et al., Managing update conflicts in Bayou, a weakly connected replicated storage system, vol.29, 1995.

, The ABrain Project, 2018.

A. The and . Mapreduce, , 2018.

A. The and . Overflow, , 2018.

, The BigStorage Project, 2018.

, The HIRP Project on Low Latency for Stream Storage, 2018.

A. Tirumala, F. Qin, J. Dugan, J. Ferguson, and K. Gibbs, iPerf: TCP/UDP bandwidth measurement tool, 2005.

K. Tzoumas, High-throughput, low-latency, and exactly-once stream processing with Apache Flink, 2015.

V. Vavilapalli, Apache Hadoop YARN: Yet Another Resource Negotiator, Proceedings of the 4th Annual Symposium on Cloud Computing. SOCC '13, vol.5, p.16, 2013.

M. Vilayannur, S. Lang, R. Ross, R. Klundt, and L. Ward, Extending the POSIX I/O interface: a parallel file system perspective, 2008.

J. Wang, D. Crawl, and I. Altintas, Kepler + Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems, Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science. WORKS'09, pp.1-8, 2009.

L. Wang, BigDataBench: A big data benchmark suite from Internet services, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), 2014.

A. Sage, . Weil, A. Scott, E. L. Brandt, . Miller et al., Ceph: A scalable, high-performance distributed file system, Proceedings of the 7th symposium on Operating systems design and implementation. USENIX Association, pp.307-320, 2006.

A. Sage, A. W. Weil, . Leung, A. Scott, C. Brandt et al., Rados: a scalable, reliable storage service for petabyte-scale storage clusters, Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing'07, pp.35-44, 2007.

, What is Hadoop Yarn. Tech. rep. COSO IT, 2017.

F. Yang, E. Tschetter, X. Léauté, N. Ray, G. Merlino et al., Druid: A Real-time Analytical Data Store, Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. SIGMOD '14, pp.157-168, 2014.

M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, Spark: Cluster Computing with Working Sets, Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing. HotCloud'10, pp.10-10, 2010.

M. Zaharia, M. Chowdhury, J. Michael, S. Franklin, I. Shenker et al., International Workshop on Hot Topics in Cloud Computing, vol.10, pp.1-7, 2010.

M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker et al., Discretized Streams: Fault-tolerant Streaming Computation at Scale, Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. SOSP '13, pp.423-438, 2013.

M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker et al., Discretized Streams: Fault-tolerant Streaming Computation at Scale, Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. SOSP '13. Farminton, pp.423-438, 2013.

Q. Zheng, K. Ren, G. Gibson, W. Bradley, G. Settlemyer et al., DeltaFS: Exascale file systems scale better without dedicated servers, Proceedings of the 10th Parallel Data Storage Workshop, pp.1-6, 2015.