Skip to Main content Skip to Navigation
New interface
Conference papers

Knowledge Integration in Deep Clustering

Abstract : Constrained clustering that integrates knowledge in the form of constraints in a clustering process has been studied for more than two decades. Popular clustering algorithms such as K-means, spectral clustering and recent deep clustering already have their constrained versions, but they usually lack of expressiveness in the form of constraints. In this paper we consider prior knowledge expressing relations between some data points and their assignments to clusters in propositional logic and we show how a deep clustering framework can be extended to integrate this knowledge. To achieve this, we define an expert loss based on the weighted models of the logical formulas; the weights depend on the soft assignment of points to clusters dynamically computed by the deep learner. This loss is integrated in the deep clustering method. We show how it can be computed efficiently using Weighted Model Counting and decomposition techniques. This method has the advantages of both integrating general knowledge and being independent of the neural architecture. Indeed, we have integrated the expert loss into two well-known deep clustering algorithms (IDEC and SCAN). Experiments have been conducted to compare our systems IDEC-LK and SCAN-LK to state-of-the-art methods for pairwise and triplet constraints in terms of computational cost, clustering quality and constraint satisfaction. We show that IDEC-LK can achieve comparable results with these systems, which are tailored for these specific constraints. To show the flexibility of our approach to learn from high-level domain constraints, we have integrated implication constraints, and a new constraint, called span- limited constraint that limits the number of clusters a set of points can belong to. Some experiments are also performed showing that constraints on some points can be extrapolated to other similar points.
Document type :
Conference papers
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03784741
Contributor : Thi-Bich-Hanh Dao Connect in order to contact the contributor
Submitted on : Friday, September 23, 2022 - 12:08:15 PM
Last modification on : Saturday, September 24, 2022 - 3:46:03 AM

Identifiers

  • HAL Id : hal-03784741, version 1

Citation

Nguyen-Viet-Dung Nghiem, Christel Vrain, Thi-Bich-Hanh Dao. Knowledge Integration in Deep Clustering. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases ECMLPKDD, Sep 2022, Grenoble, France. ⟨hal-03784741⟩

Share

Metrics

Record views

21