S. Bringay, N. Béchet, F. Bouillot, P. Poncelet, M. Roche et al., Towards an On-Line Analysis of Tweets Processing, International Conference on Database and Expert Systems Applications (DEXA), pp.154-161, 2011.
DOI : 10.1137/1.9781611972795.96

URL : https://hal.archives-ouvertes.fr/hal-00636285

J. D. Cooper, M. D. Robinson, J. A. Slansky, and N. D. Kiger, Literacy: Helping students construct meaning, Cengage Learning, 2014.

J. Darmont, Data Processing Benchmarks, IGI Global, pp.146-152, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00978026

J. Ferrarons, M. Adhana, C. Colmenares, S. Pietrowska, F. Bentayeb et al., PRIMEBALL: A Parallel Processing Framework Benchmark for Big Data Applications in the Cloud, 5th TPC Technology Conference on Performance Evaluation and Benchmarking, pp.109-124, 2013.
DOI : 10.1007/978-3-319-04936-6_8

URL : https://hal.archives-ouvertes.fr/hal-00921822

A. E. Gattiker, F. H. Gebara, H. P. Hofstee, J. D. Hayes, and A. Hylick, Big Data text-oriented benchmark creation for Hadoop, IBM Journal of Research and Development, vol.57, issue.3/4, pp.1-106, 2013.
DOI : 10.1147/JRD.2013.2240732

J. Gray, The Benchmark Handbook for Database and Transaction Systems, 1993.

A. Guille and C. Favre, Event detection, tracking, and visualization in Twitter: a mention-anomaly-based approach, Social Network Analysis and Mining, vol.103, issue.23, p.18, 2015.
DOI : 10.1145/1935826.1935863

URL : https://hal.archives-ouvertes.fr/hal-01154825

D. D. Lewis, Y. Yang, T. G. Rose, and F. Li, RCV1: A new benchmark collection for text categorization research, Journal of Machine Learning Research, vol.5, pp.361-397, 2004.

C. D. Manning, P. Raghavan, and H. Schütze, Introduction to information retrieval, 2008.
DOI : 10.1017/CBO9780511809071

O. Shea, J. Bandar, Z. Crockett, K. A. Mclean, and D. , Benchmarking short text semantic similarity, International Journal of Intelligent Information and Database Systems, vol.4, issue.2, pp.103-120, 2010.
DOI : 10.1504/IJIIDS.2010.032437

G. Paltoglou and M. Thelwall, A study of information retrieval weighting schemes for sentiment analysis, 48th Annual Meeting of the Association for Computational Linguistics, pp.1386-1395, 2010.

I. Partalas, A. Kosmopoulos, N. Baskiotis, T. Artì-eres, G. Paliouras et al., LSHTC: A benchmark for largescale text classification, p.8581, 2015.

F. Ravat, O. Teste, R. Tournier, and G. Zurfluh, Top keyword: an aggregation function for textual document OLAP, 10th International Conference on Data Warehousing and Knowledge Discovery (DaWaK, pp.55-64, 2008.
DOI : 10.1007/978-3-540-85836-2_6

A. J. Reagan, B. F. Tivnan, J. R. Williams, C. M. Danforth, and P. S. Dodds, Benchmarking sentiment analysis methods for large-scale texts: A case for using continuum-scored words and word shift graphs, p.531, 2015.

S. Jones, K. Walker, S. Robertson, and S. E. , A probabilistic model of information retrieval: development and comparative experiments, Information Processing & Management, vol.36, issue.6, pp.779-808, 2000.
DOI : 10.1016/S0306-4573(00)00015-7

S. Jones, K. Walker, S. Robertson, and S. E. , A probabilistic model of information retrieval: development and comparative experiments, Information Processing & Management, vol.36, issue.6, pp.809-840, 2000.
DOI : 10.1016/S0306-4573(00)00016-9

C. O. Truic?-a, J. Darmont, and J. Velcin, A Scalable Document-Based Architecture for??Text Analysis, International Conference on Advanced Data Mining and Applications (ADMA), pp.481-494, 2016.
DOI : 10.1137/1.9781611972795.96

L. Wang, X. Dong, X. Zhang, Y. Wang, T. Ju et al., TextGen: a realistic text data content generation method for modern storage system benchmarks, Frontiers of Information Technology & Electronic Engineering, vol.17, issue.10, pp.982-993, 2016.
DOI : 10.1631/FITEE.1500332