Unsupervised Feature Construction for Improving Data Representation and Semantics

Abstract : Feature-based format is the main data representation format used by machine learning algorithms. When the features do not properly describe the initial data, performance starts to degrade. Some algorithms address this problem by internally changing the representation space, but the newly-constructed features are rarely comprehensible. We seek to construct, in an unsupervised way, new features that are more appropriate for describing a given dataset and, at the same time, comprehensible for a human user. We propose two algorithms that construct the new features as conjunctions of the initial primitive features or their negations. The generated feature sets have reduced correlations between features and succeed in catching some of the hidden relations between individuals in a dataset. For example, a feature like sky ∧ ¬building ∧ panorama would be true for non-urban images and is more informative than simple features expressing the presence or the absence of an object. The notion of Pareto optimality is used to evaluate feature sets and to obtain a balance between total correlation and the complexity of the resulted feature set. Statistical hypothesis testing is used in order to automatically determine the values of the parameters used for constructing a data-dependent feature set. We experimentally show that our approaches achieve the construction of informative feature sets for multiple datasets.
Type de document :
Article dans une revue
Journal of Intelligent Information Systems, Springer Verlag, 2013, 40 (3), pp.501-527. 〈10.1007/s10844-013-0235-x〉
Liste complète des métadonnées

Littérature citée [24 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-00866982
Contributeur : Fabien Rico <>
Soumis le : vendredi 27 septembre 2013 - 14:30:18
Dernière modification le : vendredi 27 septembre 2013 - 14:47:33
Document(s) archivé(s) le : samedi 28 décembre 2013 - 04:30:43

Fichier

RIZOIU_JIIS-2013-preprint.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Marian-Andrei Rizoiu, Julien Velcin, Stéphane Lallich. Unsupervised Feature Construction for Improving Data Representation and Semantics. Journal of Intelligent Information Systems, Springer Verlag, 2013, 40 (3), pp.501-527. 〈10.1007/s10844-013-0235-x〉. 〈hal-00866982〉

Partager

Métriques

Consultations de la notice

232

Téléchargements de fichiers

146