A. Beam,

, Cascading | Application Platform for Enterprise Big Data

. Coservit,

. Flickr,

N. Flowhub,

. Microsoft,

, Nagios-the industry standard in it infrastructure monitoring

. Openstack,

. Pipeline,

, Puma benchmarks and dataset downloads

. Pythonspeed/performancetips-python and . Wiki,

. Smart-support and . Center,

. Xplenty,

. Adsquare, , 2016.

. Shinken, , 2017.

. Zabbix, , 2017.

V. Akgiray, Conditional heteroscedasticity in time series of stock returns: Evidence and forecasts, Journal of business, pp.55-80, 1989.

T. Akidau, A. Balikov, K. Bekiro?lu, S. Chernyak, J. Haberman et al., Millwheel: faulttolerant stream processing at internet scale, Proceedings of the VLDB Endowment, vol.6, pp.1033-1044, 2013.

T. Akidau, R. Bradshaw, C. Chambers, S. Chernyak, R. J. Fernándezmoctezuma et al., The dataflow model: A practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing, Proceedings of the VLDB Endowment, vol.8, pp.1792-1803, 2015.

O. Alliance, Osgi-the dynamic module system for java, 2009.

R. Ananthanarayanan, V. Basker, S. Das, A. Gupta, H. Jiang et al., Photon: Fault-tolerant and scalable joining of continuous data streams, Proceedings of the 2013 ACM SIGMOD international conference on management of data, pp.577-588, 2013.

C. Anderson, The Long Tail: Why the Future of Business Is Selling Less of More. Hyperion, 2006.

L. Aniello, R. Baldoni, and L. Querzoni, Adaptive online scheduling in storm, Proceedings of the 7th ACM International Conference on Distributed Event-based Systems, DEBS '13, pp.207-218, 2013.

M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz et al., A view of cloud computing, Communications of the ACM, vol.53, issue.4, pp.50-58, 2010.

M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu et al., Spark sql: Relational data processing in spark, Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp.1383-1394, 2015.

J. Atwood, The best code is no code at all, 2007.

S. Babu, Towards automatic optimization of mapreduce programs, Proceedings of the 1st ACM symposium on Cloud computing, pp.137-142, 2010.

X. Bai, A. Jégou, F. Junqueira, and V. Leroy, Dynasore: Efficient in-memory store for social applications, Middleware 2013-ACM/IFIP/USENIX 14th International Middleware Conference, Beijing, pp.425-444, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00932468

P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris et al., Xen and the art of virtualization, ACM SIGOPS operating systems review, vol.37, pp.164-177, 2003.

L. A. Barroso, J. Clidaras, and U. Hölzle, The datacenter as a computer: An introduction to the design of warehouse-scale machines, Synthesis lectures on computer architecture, vol.8, issue.3, pp.1-154, 2013.

D. Bau, J. Gray, C. Kelleher, J. Sheldon, and F. Turbak, Learnable programming: Blocks and beyond, Commun. ACM, vol.60, issue.6, pp.72-80, 2017.

S. Beheshti-kashi, H. R. Karimi, K. Thoben, M. Lütjen, and M. Teucke, A survey on retail sales forecasting and prediction in fashion markets, Systems Science & Control Engineering, vol.3, issue.1, pp.154-161, 2015.

F. Bellard, Qemu, a fast and portable dynamic translator, USENIX Annual Technical Conference, FREENIX Track, pp.41-46, 2005.

G. Bontempi, S. B. Taieb, and Y. Borgne, Machine learning strategies for time series forecasting, Business Intelligence, pp.62-77, 2013.

G. Box and G. Jenkins, Time series analysis: forecasting and control. Holden-Day series in time series analysis, 1970.

L. Breiman, Random forests, Machine Learning, vol.45, pp.5-32, 2001.

E. Bruneton, T. Coupaye, M. Leclercq, V. Quéma, and J. Stefani, The fractal component model and its support in java. Software: Practice and Experience, vol.36, pp.1257-1284, 2006.

C. Budak, T. Georgiou, D. Agrawal, and A. E. Abbadi, Geoscope: Online detection of geo-correlated information trends in social networks, Proc. VLDB Endow, vol.7, pp.229-240, 2013.

B. Burns, B. Grant, D. Oppenheimer, E. Brewer, J. Wilkes et al., ACM Queue, vol.14, pp.70-93, 2016.

S. D. Campbell and F. Diebold, Weather forecasting for weather derivatives, Journal of the American Statistical Association, vol.100, pp.6-16, 2005.

M. Caneill, N. D. Palma, A. Ait-bachir, B. Dine, R. Mokhtari et al., Online metrics prediction in monitoring systems, 2018 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), 2018.
URL : https://hal.archives-ouvertes.fr/hal-02006574

M. Caneill, A. El-rheddane, V. Leroy, and N. D. Palma, Locality-aware routing in stateful streaming applications, Proceedings of the 17th International Middleware Conference, Middleware '16, vol.4, pp.1-4, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01407457

P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi et al., Apache flink: Stream and batch processing in a single engine, Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol.36, issue.4, 2015.

V. Cardellini, V. Grassi, F. L. Presti, and M. Nardelli, Distributed qosaware scheduling in storm, Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems, DEBS '15, pp.344-347, 2015.

T. Chalermarrewong, T. Achalakul, and S. C. See, Failure prediction of data centers using time series and fault tree analysis, 2012 IEEE 18th International Conference on Parallel and Distributed Systems, pp.794-799, 2012.

C. Chatfield, The Analysis of Time Series: An Introduction, Sixth Edition. Chapman & Hall/CRC Texts in Statistical Science, 2003.

S. Chen and J. Hwang, Temperature prediction using fuzzy time series, IEEE Transactions on Systems, Man, and Cybernetics, vol.30, issue.2, pp.263-275, 2000.

Y. G. Cinar, H. Mirisaee, P. Goswami, E. Gaussier, A. Ait-bachir et al., Time series forecasting using rnns: an extended attention mechanism to model periods and handle missing values, 2017.

G. G. Creamer and Y. Freund, Predicting performance and quantifying corporate governance risk for latin american adrs and banks. Financial Engineering and Applications, 2004.

C. Curino, E. Jones, Y. Zhang, and S. Madden, Schism: A workloaddriven approach to database replication and partitioning, Proc. VLDB Endow, vol.3, issue.1-2, pp.48-57, 2010.

A. Davidson and A. Or, Optimizing shuffle performance in spark, 2013.

J. G. De-gooijer and R. J. Hyndman, 25 years of time series forecasting, International journal of forecasting, vol.22, issue.3, pp.443-473, 2006.

J. Dean and S. Ghemawat, Mapreduce: Simplified data processing on large clusters, OSDI, pp.137-150, 2004.

J. Dem?ar, T. Curk, A. Erjavec, ?. Gorup, T. Ho?evar et al., Orange: Data mining toolbox in python, Journal of Machine Learning Research, vol.14, pp.2349-2353, 2013.

L. Fischer and A. Bernstein, Workload scheduling in distributed stream processors using graph partitioning, 2015 IEEE International Conference on Big Data, Big Data, pp.124-133, 2015.

M. Franklin, The berkeley data analytics stack: Present and future, Big Data, 2013 IEEE International Conference on, pp.2-3, 2013.

N. Fraser, , 2013.

T. V. Gestel, J. A. Suykens, D. E. Baestaens, A. Lambrechts, G. Lanckriet et al., Financial time series prediction using least squares support vector machines within the evidence framework, IEEE Transactions on Neural Networks, vol.12, issue.4, pp.809-821, 2001.

M. Ghassemi, M. A. Pimentel, T. Naumann, T. Brennan, D. A. Clifton et al., A multivariate timeseries modeling approach to severity of illness assessment and forecasting in icu with sparse, heterogeneous clinical data, Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI'15, pp.446-453, 2015.

C. L. Giles, S. Lawrence, and A. C. Tsoi, Noisy time series prediction using recurrent neural networks and grammatical inference, Machine learning, vol.44, issue.1, pp.161-183, 2001.

J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin et al., Graphx: Graph processing in a distributed dataflow framework, 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pp.599-613, 2014.

G. Hamilton, Javabeans. API Specification, 1997.

J. D. Hamilton, Time series analysis, vol.2, 1994.

B. Harvey, D. D. Garcia, T. Barnes, N. Titterton, O. Miller et al., Snap! (build your own blocks) (abstract only), Proceedings of the 45th
DOI : 10.1145/2445196.2445507

, ACM Technical Symposium on Computer Science Education, SIGCSE '14, pp.749-749, 2014.

H. Herodotou and S. Babu, Profiling, what-if analysis, and cost-based optimization of mapreduce programs, vol.4, pp.1111-1122, 2011.

B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph et al., Mesos: A platform for fine-grained resource sharing in the data center, NSDI, vol.11, pp.22-22, 2011.

P. Hunt, M. Konar, F. P. Junqueira, and B. Reed, Zookeeper: Wait-free coordination for internet-scale systems, USENIX annual technical conference, vol.8, 2010.

E. Jahani, M. J. Cafarella, and C. Ré, Automatic optimization for mapreduce programs, Proceedings of the VLDB Endowment, vol.4, pp.385-396, 2011.

T. Jiang, Q. Zhang, R. Hou, L. Chai, S. A. Mckee et al., Understanding the behavior of in-memory computing workloads, 2014 IEEE International Symposium on Workload Characterization (IISWC), pp.22-30, 2014.

E. Jonas, S. Venkataraman, I. Stoica, and B. Recht, Occupy the cloud: Distributed computing for the 99%, 2017.

J. Lehtosalo,

P. Kamp and R. N. Watson, Jails: Confining the omnipotent root, Proceedings of the 2nd International SANE Conference, vol.43, p.116, 2000.

G. Karypis and V. Kumar, A fast and high quality multilevel scheme for partitioning irregular graphs, SIAM J. Sci. Comput, vol.20, issue.1, pp.359-392, 1998.

R. Khandekar, K. Hildrum, S. Parekh, D. Rajan, J. Wolf et al., Cola: Optimizing stream processing applications via graph partitioning, Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware, Middleware '09, vol.16, pp.1-16, 2009.

D. E. Knuth, Literate programming, The Computer Journal, vol.27, issue.2, pp.97-111, 1984.

S. Kulkarni, N. Bhagat, M. Fu, V. Kedigehalli, C. Kellogg et al., Twitter heron: Stream processing at scale, Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15, pp.239-250, 2015.

A. Lakshman and P. Malik, Cassandra: a decentralized structured storage system, ACM SIGOPS Operating Systems Review, vol.44, issue.2, pp.35-40, 2010.

L. Lamport, Distribution, 1987.

P. Leitner, A. Michlmayr, F. Rosenberg, and S. Dustdar, Monitoring, prediction and prevention of sla violations in composite services, 2010 IEEE International Conference on Web Services, pp.369-376, 2010.

L. Li, C. M. Liang, J. Liu, S. Nath, A. Terzis et al., Thermocast: A cyber-physical forecasting model for datacenters, Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '11, pp.1370-1378, 2011.

N. Marz and J. Warren, Big Data: Principles and Best Practices of Scalable Realtime Data Systems, 2015.

A. D. Mcquarrie and C. Tsai, Regression and time series model selection, 1998.

P. Mell and T. Grance, The nist definition of cloud computing, 2011.

X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman et al., Mllib: Machine learning in apache spark, Journal of Machine Learning Research, vol.17, issue.34, pp.1-7, 2016.

D. Merkel, Docker: lightweight linux containers for consistent development and deployment, Linux Journal, issue.239, 2014.

R. C. Merkle, A digital signature based on a conventional encryption function, Conference on the Theory and Application of Cryptographic Techniques, pp.369-378, 1987.

A. Metwally, D. Agrawal, and A. E. Abbadi, Efficient computation of frequent and top-k elements in data streams, Proceedings of the 10th International Conference on Database Theory, ICDT'05, pp.398-412, 2005.

K. Müller, A. J. Smola, G. Rätsch, B. Schölkopf, J. Kohlmorgen et al., Predicting time series with support vector machines, International Conference on Artificial Neural Networks, pp.999-1004, 1997.

M. A. Nasir, G. D. Morales, D. García-soriano, N. Kourtellis, and M. Serafini, The power of both choices: Practical load balancing for distributed stream processing engines, 31st IEEE International Conference on Data Engineering, ICDE, pp.137-148, 2015.

M. A. Nasir, G. D. Morales, N. Kourtellis, and M. Serafini, When two choices are not enough: Balancing at scale in distributed stream processing, 32nd IEEE International Conference on Data Engineering, ICDE, 2015.

L. Neumeyer, B. Robbins, A. Nair, and A. Kesari, S4: Distributed stream computing platform, 2010 IEEE International Conference on Data Mining Workshops, pp.170-177, 2010.

S. A. Noghabi, K. Paramasivam, Y. Pan, N. Ramesh, J. Bringhurst et al., Samza: stateful scalable stream processing at linkedin. Proceedings of the VLDB Endowment, vol.10, pp.1634-1645, 2017.

K. Ousterhout, R. Rasti, S. Ratnasamy, S. Shenker, and B. Chun, Making sense of performance in data analytics frameworks, 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15), pp.293-307, 2015.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion et al., Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, vol.12, pp.2825-2830, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00650905

B. Peng, M. Hosseini, Z. Hong, R. Farivar, and R. Campbell, R-storm: Resource-aware scheduling in storm, Proceedings of the 16th Annual Middleware Conference, Middleware '15, pp.149-161, 2015.

D. K. Rensin, Kubernetes-Scheduling the Future at Cloud Scale, 2015.

M. Resnick, J. Maloney, A. Monroy-hernández, N. Rusk, E. Eastmond et al., Scratch: Programming for all, Commun. ACM, vol.52, issue.11, pp.60-67, 2009.

D. Riemer, F. Kaulfersch, R. Hutmacher, and L. Stojanovic, Streampipes: solving the challenge with semantic stream processing pipelines, Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems, pp.330-331, 2015.

N. Rivetti, L. Querzoni, E. Anceaume, Y. Busnel, and B. Sericola, Efficient key grouping for near-optimal load balancing in stream processing systems, Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems, DEBS '15, pp.80-91, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01194518

J. Schmidhuber, A computer scientist's view of life, the universe, and everything, Foundations of computer science, pp.201-208, 1997.

M. Schwarzkopf, A. Konwinski, M. Abd-el-malek, and J. Wilkes, Omega: flexible, scalable schedulers for large compute clusters, SIGOPS European Conference on Computer Systems (EuroSys), pp.351-364, 2013.

S. Singh and Y. Liu, A cloud service architecture for analyzing big monitoring data, Tsinghua Science and Technology, vol.21, issue.1, pp.55-70, 2016.

E. R. Sparks, S. Venkataraman, T. Kaftan, M. J. Franklin, and B. Recht, Keystoneml: Optimizing pipelines for large-scale advanced analytics, 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp.535-546, 2017.

R. Taft, E. Mansour, M. Serafini, J. Duggan, A. J. Elmore et al., E-store: Fine-grained elastic partitioning for distributed transaction processing, Proc. VLDB Endow, vol.8, pp.245-256, 2014.

, The Apache Spark developers. ML Pipelines, 2017.

, The Apache Storm developers, 2017.

A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel et al., Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD '14, pp.147-156, 2014.

V. Vapnik, The nature of statistical learning theory, 1995.

A. Verma, L. Pedrosa, M. R. Korupolu, D. Oppenheimer, E. Tune et al., Large-scale cluster management at Google with Borg, Proceedings of the European Conference on Computer Systems (EuroSys), 2015.

S. Vinoski, Corba: integrating diverse applications within distributed heterogeneous environments, IEEE Communications magazine, vol.35, issue.2, pp.46-55, 1997.

D. Weise, S. Garfinkel, and S. Strassmann, The UNIX-Haters Handbook. IDG books, 1994.

P. J. Werbos, Backpropagation through time: what it does and how to do it, Proceedings of the IEEE, vol.78, issue.10, pp.1550-1560, 1990.

. Yelparchive and . Pyleus, , 2016.

H. Yin, A. R. Benson, J. Leskovec, and D. F. Gleich, Local higherorder graph clustering, Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '17, pp.555-564, 2017.

M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, Spark: Cluster Computing with Working Sets, Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud'10, pp.10-10, 2010.

M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker et al., Discretized Streams: Fault-tolerant Streaming Computation at Scale, Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pp.423-438, 2013.

M. Zaharia, B. Hindman, A. Konwinski, A. Ghodsi, A. D. Joesph et al., The datacenter needs an operating system, Proceedings of the 3rd USENIX Conference on Hot Topics in Cloud Computing, HotCloud'11, pp.17-17, 2011.

Å. Dragland and . Sintef, Big data, for better or worse: 90% of world's data generated over last two years