Unsupervised Learning Informational Limit in case of Sparsely Described Examples - Archive ouverte HAL Accéder directement au contenu
Chapitre D'ouvrage Année : 2007

Unsupervised Learning Informational Limit in case of Sparsely Described Examples

Résumé

This paper presents a model characterizing unsupervised learning from an information theoretic point of view. Under some hypothesis, it defines a theoretical quality criterion, which corresponds to the informational limit that bounds the learning ability of any clustering algorithm. This quality criterion depends on the information content of the learning set. It is relevant when examples are sparsely described, i.e. when most of the descriptors are missing. This theoretical limit of any unsupervised learning algorithm is then compared to the actual learning quality of different clustering algorithms (EM, COBWEB and PRESS). This empirical comparison is based on the use of artificial data sets, which are randomly degraded. Finally, the paper shows that the results of PRESS, an algorithm specifically designed to learn from sparsely described examples, are very closed to the theoretical upper bound quality.

Dates et versions

hal-01335820 , version 1 (22-06-2016)

Identifiants

Citer

Jean-Gabriel Ganascia, Julien Velcin. Unsupervised Learning Informational Limit in case of Sparsely Described Examples. Selected Contributions in Classification and Data Analysis, Springer, pp.345-355, 2007, 978-3-540-73558-8. ⟨10.1007/978-3-540-73560-1_32⟩. ⟨hal-01335820⟩
40 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More