Knowledge Integration in Deep Clustering - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2022

Knowledge Integration in Deep Clustering

Résumé

Constrained clustering that integrates knowledge in the form of constraints in a clustering process has been studied for more than two decades. Popular clustering algorithms such as K-means, spectral clustering and recent deep clustering already have their constrained versions, but they usually lack of expressiveness in the form of constraints. In this paper we consider prior knowledge expressing relations between some data points and their assignments to clusters in propositional logic and we show how a deep clustering framework can be extended to integrate this knowledge. To achieve this, we define an expert loss based on the weighted models of the logical formulas; the weights depend on the soft assignment of points to clusters dynamically computed by the deep learner. This loss is integrated in the deep clustering method. We show how it can be computed efficiently using Weighted Model Counting and decomposition techniques. This method has the advantages of both integrating general knowledge and being independent of the neural architecture. Indeed, we have integrated the expert loss into two well-known deep clustering algorithms (IDEC and SCAN). Experiments have been conducted to compare our systems IDEC-LK and SCAN-LK to state-of-the-art methods for pairwise and triplet constraints in terms of computational cost, clustering quality and constraint satisfaction. We show that IDEC-LK can achieve comparable results with these systems, which are tailored for these specific constraints. To show the flexibility of our approach to learn from high-level domain constraints, we have integrated implication constraints, and a new constraint, called span- limited constraint that limits the number of clusters a set of points can belong to. Some experiments are also performed showing that constraints on some points can be extrapolated to other similar points.
Fichier non déposé

Dates et versions

hal-03784741 , version 1 (23-09-2022)

Identifiants

  • HAL Id : hal-03784741 , version 1

Citer

Nguyen-Viet-Dung Nghiem, Christel Vrain, Thi-Bich-Hanh Dao. Knowledge Integration in Deep Clustering. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases ECMLPKDD, Sep 2022, Grenoble, France. ⟨hal-03784741⟩
76 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More