Adequacy of a User-Defined Vocabulary to the Data Structure
Résumé
Clustering methods are of a particular interest to discover and to summarize the structure of a data set. However, interpreting clusters may be abstruse for unexperienced users who most of the time possess their own vocabulary to describe data and properties. In this article, an approach is proposed to determine and quantify how appropriate a user-defined vocabulary is regarding the structure captured on the data distribution using a clustering method. Two measures of vocabulary appropriateness based on clustering are proposed and tested on artificial data.