S. M. Abdulrahman, P. Brazdil, J. N. Van-rijn, and E. J. Vanschoren, Speeding up algorithm selection using average ranking and active testing by introducing runtime, Machine learning, vol.107, issue.1, pp.79-108, 2018.

A. Ahmad and L. Dey, A k-mean clustering algorithm for mixed numeric and categorical data, Data & Knowledge Engineering, vol.63, issue.2, pp.503-527, 2007.

A. Ahmad and S. S. Khan, Survey of state-of-the-art mixed data clustering algorithms, IEEE Access, vol.7, pp.31883-31902, 2019.

M. Alamuri, B. R. Surampudi, and A. Negi, A survey of distance/similarity measures for categorical data, 2014 International joint conference on neural networks (IJCNN), pp.1907-1914, 2014.

P. Andritsos and P. Tsaparas, Categorical data clustering, Encyclopedia of Machine Learning and Data Mining, pp.188-193, 2017.

M. C. Barioni, H. Razente, A. M. Marcelino, A. J. Traina, and C. Traina, Open issues for partitioning clustering methods : an overview, Wiley Interdisciplinary Reviews : Data Mining and Knowledge Discovery, vol.4, issue.3, pp.161-177, 2014.

S. Boriah, V. Chandola, and V. Kumar, Similarity measures for categorical data : A comparative evaluation, Proceedings of the 2008 SIAM international conference on data mining, pp.243-254, 2008.

P. Brazdil, C. G. Carrier, C. Soares, and R. Vilalta, Metalearning : Applications to Data Mining, 2008.

L. Breiman, Random forests, Machine learning, vol.45, issue.1, pp.5-32, 2001.

T. R. Santos and L. E. Zárate, Categorical data clustering : What similarity measure to recommend ?, Expert Systems with Applications, vol.42, issue.3, pp.1247-1260, 2015.

D. G. Ferrari and L. N. De-castro, Clustering algorithm selection by meta-learning systems : A new distance-based problem characterization and ranking combination methods, Information Sciences, vol.301, pp.181-194, 2015.

D. W. Goodall, A new similarity index based on probability, Biometrics, pp.882-907, 1966.

S. Guha, R. Rastogi, and K. Shim, Rock : A robust clustering algorithm for categorical attributes, Information systems, vol.25, issue.5, pp.345-366, 2000.

L. Hubert and P. Arabie, Comparing partitions, Journal of classification, vol.2, issue.1, pp.193-218, 1985.

A. Kalousis, Algorithm selection via meta-learning, 2002.

D. Lin, An Information-Theoretic Definition of Similarity, Proceedings of ICML, pp.296-304, 1998.

T. T. Nguyen, D. Dinh, S. Sriboonchitta, and V. Huynh, A method for kmeans-like clustering of categorical data, Journal of Ambient Intelligence and Humanized Computing, pp.1-11, 2019.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion et al., Scikit-learn : Machine learning in Python, Journal of Machine Learning Research, vol.12, pp.2825-2830, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00650905

B. A. Pimentel and A. C. De-carvalho, A new data characterization for selecting clustering algorithms using meta-learning, Information Sciences, vol.477, pp.203-219, 2019.

Z. ?ulc and H. ?ezanková, Comparison of similarity measures for categorical data in hierarchical clustering, Journal of Classification, vol.36, issue.1, pp.58-72, 2019.

, The latter is indeed the case for real world datasets that comprise categorical attributes. Several similarity measures have been proposed in the literature, however, their choice depends on the context and the dataset at hand. In this paper, we address the following question: given a set of measures, which one is best suited for clustering a particular dataset? We propose an approach to automate this choice, and we present an empirical study based on categorical datasets, Summary Data clustering is a well-known task in data mining and it often relies on distances or, in some cases, similarity measures