C. Charu, C. Aggarwal, and . Zhai, A Survey of Text Clustering Algorithms, Mining Text Data, pp.77-128, 2012.

,

R. Alghamdi and K. Alfalqi, A Survey of Topic Modeling in Text Mining, International Journal of Advanced Computer Science and Applications, vol.6, pp.147-153, 2015.

P. Bellot, A. Doucet, S. Geva, S. Gurajada, J. Kamps et al., SIGIR Forum, vol.47, pp.21-32, 2013.

A. Bifet and E. Frank, Sentiment Knowledge Discovery in Twitter Streaming Data, Discovery Science, pp.1-15, 2010.

S. Bringay, N. Béchet, F. Bouillot, P. Poncelet, M. Roche et al., Towards an On-Line Analysis of Tweets Processing, International Conference on Database and Expert Systems Applications (DEXA), pp.154-161, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00636285

M. Crane, J. S. Culpepper, J. Lin, J. Mackenzie, and A. Trotman, A Comparison of Document-at-a-Time and Score-at-a-Time Query Evaluation, 10th ACM International Conference on Web Search and Data Mining (WSDM), pp.201-210, 2017.

J. Ferrarons, M. Adhana, C. Colmenares, S. Pietrowska, F. Bentayeb et al., PRIMEBALL : a Parallel Processing Framework Benchmark for Big Data Applications in the Cloud, 5th TPC Technology Conference on Performance Evaluation and Benchmarking (TPC-TC), vol.839, pp.109-124, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00921822

A. E. Gattiker, H. Fadi, H. Gebara, J. D. Peter-hofstee, A. Hayes et al., Big Data text-oriented benchmark creation for Hadoop, IBM Journal of Research and Development, vol.57, issue.4, pp.1-10, 2013.

J. Gray, The Benchmark Handbook for Database and Transaction Systems, 1993.

A. Guille and C. Favre, Event detection, tracking, and visualization in Twitter : a mention-anomaly-based approach, Social Network Analysis and Mining, vol.5, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01158068

S. Huang, J. Huang, J. Dai, T. Xie, and B. Huang, The HiBench benchmark suite : Characterization of the MapReduce-based data analysis, Workshops Proceedings of the 26th International Conference on Data Engineering (ICDE), pp.41-51, 2010.

J. Lin, M. Crane, A. Trotman, J. Callan, I. Chattopadhyaya et al., Toward Reproducible Baselines : The Open-Source IR Reproducibility Challenge, Advances in Information Retrieval, pp.408-420, 2016.

F. Raiber and O. Kurland, Kullback-Leibler Divergence Revisited, Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval, pp.117-124, 2017.

F. Ravat, O. Teste, R. Tournier, and G. Zuruh, Top_Keyword : an Aggregation Function for Textual Document OLAP, 10th International Conference on Data Warehousing and Knowledge Discovery, pp.55-64, 2008.

, Transaction Processing Performance Council, TPC Express Benchmark HS Standard Specication, 2016.

J. Ciprian-octavian-truic? and . Darmont, T 2 K 2 : The Twitter Top-K Keywords Benchmark, 21st European Conference on Advances in Databases and Information Systems (ADBIS), 2017.

,

J. Ciprian-octavian-truic?, A. Darmont, F. Boicea, and . R?du-lescu, Benchmarking Top-K Keyword and Top-K Document Processing with T 2 K 2 and T 2 K 2 D 2, Future Generation Computer Systems, vol.85, pp.60-75, 2018.

J. Ciprian-octavian-truic?, J. Darmont, and . Velcin, A Scalable Document-based Architecture for Text Analysis, International Conference on Advanced Data Mining and Applications (ADMA). LNAI 10086, pp.481-494, 2016.

L. Wang, J. Zhan, C. Luo, Y. Zhu, Q. Yang et al., BigDataBench : A big data benchmark suite from internet services, 20th IEEE International Symposium on High Performance Computer Architecture (HPCA, pp.488-499, 2014.