Noise-free Latent Block Model for High Dimensional Data - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Data Mining and Knowledge Discovery Année : 2019

Noise-free Latent Block Model for High Dimensional Data

Charlotte Laclau
Vincent Brault

Résumé

Co-clustering is known to be a very powerful and efficient approach in unsupervised learning because of its ability to partition data based on both the observations and the variables of a given dataset. However, in high-dimensional context co-clustering methods may fail to provide a meaningful result due to the presence of noisy and/or irrelevant features. In this paper, we tackle this issue by proposing a novel co-clustering model which assumes the existence of a noise cluster, that contains all irrelevant features. A variational expectation-maximization (VEM)-based algorithm is derived for this task, where the automatic variable selection as well as the joint clustering of objects and variables are achieved via a Bayesian framework. Experimental results on synthetic datasets show the efficiency of our model in the context of high-dimensional noisy data. Finally, we highlight the interest of the approach on two real datasets which goal is to study genetic diversity across the world.
Fichier principal
Vignette du fichier
Laclau2018Noise.pdf (9.77 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01685777 , version 1 (16-01-2018)
hal-01685777 , version 2 (29-10-2018)

Identifiants

Citer

Charlotte Laclau, Vincent Brault. Noise-free Latent Block Model for High Dimensional Data. Data Mining and Knowledge Discovery, 2019, 33 (2), pp.446-473. ⟨10.1007/s10618-018-0597-3⟩. ⟨hal-01685777v2⟩
260 Consultations
272 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More