Classification non supervisée : de la multiplicité des données à la multiplicité des analyses

Abstract : Data clustering is a major problem encountered mainly in related fields of Artificial Intelligence, Data Analysis and Cognitive Sciences. This topic is concerned by the production of synthetic tools that are able to transform a mass of information into valuable knowledge. This knowledge extraction is done by grouping a set of objects associated with a set of descriptors such that two objects in a same group are similar or share a same behaviour while two objects from different groups does not. This thesis present a study about some extensions of the classical clustering problem for multi-view data,where each datum can be represented by several sets of descriptors exhibing different behaviours or aspects of it. Our study impose to explore several nearby problems such that semi-supervised clustering, multi-view clustering or collaborative approaches for consensus or alternative clustering. In a first chapter, we propose an algorithm solving the multi-view clustering problem. In the second chapter, we propose a boosting-inspired algorithm and an optimization based algorithm closely related to boosting that allow the integration of external knowledge leading to the improvement of any clustering algorithm. This proposition bring an answer to the semi-supervised clustering problem. In the last chapter, we introduce an unifying framework allowing the discovery even of a set of consensus clustering solution or a set of alternative clustering solutions for mono-view data and or multi-viewdata. Such unifying approach offer a methodology to answer some current and actual hot topic in Data Mining and Knowledge Discovery in Data.
Document type :
Theses
Complete list of metadatas

Cited literature [70 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00801555
Contributor : Abes Star <>
Submitted on : Thursday, July 11, 2013 - 7:22:11 PM
Last modification on : Thursday, January 17, 2019 - 3:06:06 PM
Long-term archiving on : Wednesday, April 5, 2017 - 10:09:17 AM

File

jacqueshenri.sublemontier_2411...
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-00801555, version 2

Collections

Citation

Jacques-Henri Sublemontier. Classification non supervisée : de la multiplicité des données à la multiplicité des analyses. Autre [cs.OH]. Université d'Orléans, 2012. Français. ⟨NNT : 2012ORLE2064⟩. ⟨tel-00801555v2⟩

Share

Metrics

Record views

994

Files downloads

1047