A Filtering Algorithm for Constrained Clustering with Within-Cluster Sum of Dissimilarities Criterion
Abstract
Constrained clustering is an important task in Data Mining. In the last ten years, many works have been done to extend classical clustering algorithms to handle user-defined constraints, but restricted to handle one kind of user-constraints. In a previous work \cite{ecml2013}, we have proposed a declarative and generic framework, based on Constraint Programming, which enables to design a clustering task by specifying an optimization criterion and different kinds of user-constraints. One of the criteria is the within-cluster sum of dissimilarities, which is represented by a sum constraint and reified equality constraints. A direct implementation using predefined constraints is not effective as the propagation of theses constraints is weak. In this paper, we consider this criterion as a global constraint and develop a filtering algorithm for it. This filtering helps to improve significantly the model performance. Experiments on classical databases show the interest of our approach.