Unsupervised Learning Informational Limit in case of Sparsely Described Examples

Abstract : This paper presents a model characterizing unsupervised learning from an information theoretic point of view. Under some hypothesis, it defines a theoretical quality criterion, which corresponds to the informational limit that bounds the learning ability of any clustering algorithm. This quality criterion depends on the information content of the learning set. It is relevant when examples are sparsely described, i.e. when most of the descriptors are missing. This theoretical limit of any unsupervised learning algorithm is then compared to the actual learning quality of different clustering algorithms (EM, COBWEB and PRESS). This empirical comparison is based on the use of artificial data sets, which are randomly degraded. Finally, the paper shows that the results of PRESS, an algorithm specifically designed to learn from sparsely described examples, are very closed to the theoretical upper bound quality.
Document type :
Book sections
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01335820
Contributor : Lip6 Publications <>
Submitted on : Wednesday, June 22, 2016 - 1:47:40 PM
Last modification on : Thursday, March 21, 2019 - 1:05:10 PM

Links full text

Identifiers

Citation

Jean-Gabriel Ganascia, Julien Velcin. Unsupervised Learning Informational Limit in case of Sparsely Described Examples. Selected Contributions in Classification and Data Analysis, Springer, pp.345-355, 2007, 978-3-540-73558-8. ⟨10.1007/978-3-540-73560-1_32⟩. ⟨hal-01335820⟩

Share

Metrics

Record views

51