Modèles de classification en classes empiétantes, cas des modèles arborés

Abstract : Traditionally, classification models (such as partitions and hierarchies) aim at separating without ambiguities and produce non-overlapping clusters (i.e two clusters are either disjoint or one is included in the other). However, this non ambiguity may lead to mask information such as in the case of hybrid plants in biology or of texts which belong to two (or more) different genres in textual analysis for instance. General models like hypergraphs or lattices allow to take into account overlapping clusters. This work focuses on closed under intersection totally balanced hypergraphs and their equivalents. These hypergraphs are defined as hypergraphs with no special cycles (also called alpha-cycle) and are a generalization of trees. They are equivalent to dismantlable lattices (i.e lattices such that there recursively exists a doubly irreducible element) and have structural and algorithmic properties which allow them to fit many fields such that phylogenetics and deal with different data types such as dissimilarities, individuals/attributes matrices or graphs. In machine learning, decision trees are a widely used model as they are simple to use and understand. A part of this work focuses on the development of similar methods which allow overlapping clusters in order to give a more complete representation of data. Hence, the aimed models are strongly interpretable and can be used for classic machine learning tasks such as class prediction. This thesis presents two methods : - K-Means decision trees, a classification method which builds the model on the structure of the data and gives practical results equivalent to decision trees; - gravity decision lattices, which proposes a first approach to non-overlapping classification models. Regarding decision trees, usage requires that the trees are binary. We thus define binary hypergraphs in order to keep the simplicity specific to decision trees. We propose a characterization of binary hypergraphs by a sequence of mixed trees (similar to the characterization of totally balanced hypergraphs given by Lehel in 1985) and prove the equivalence between binarizable hypergraphs (i.e such that they can be embedded into a binary hypergraph) and totally balanced hypergraphs which makes of these hypergraphs a perfect candidate for classification inspired from decision trees. We also propose a binarization algorithm for dismantlable lattices which can be used in formal concept analysis. This work also presents a metric angle : we define totally balanced dissimilarities (dissimilarities which are associated with a totally balanced system) and give a recognition algorithm, an approximation algorithm for these dissimilarities and an algorithm which computes the clusters associated with a totally balanced dissimilarity.
Complete list of metadatas

Cited literature [106 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/tel-02436273
Contributor : Célia Châtel <>
Submitted on : Sunday, January 12, 2020 - 10:03:12 PM
Last modification on : Wednesday, January 15, 2020 - 1:43:28 AM

File

these.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-02436273, version 1

Collections

Citation

Célia Châtel. Modèles de classification en classes empiétantes, cas des modèles arborés. Informatique [cs]. Aix Marseille Université, 2018. Français. ⟨tel-02436273⟩

Share

Metrics

Record views

34

Files downloads

3