Semi-supervised learning using multiple clusterings with limited labeled data - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Information Sciences Année : 2016

Semi-supervised learning using multiple clusterings with limited labeled data

Résumé

Supervised classification consists in learning a predictive model using a set of labeled samples. It is accepted that predictive models accuracy usually increases as more labeled samples are available. Labelled samples are generally difficult to obtain as the labelling step if often performed manually. On the contrary, unlabeled samples are easily available. As the labeling task is tedious and time consuming, users generally provide a very limited number of labeled objects. However, designing approaches able to work efficiently with a very limited number of labeled samples is highly challenging. In this context, semi-supervised approaches have been proposed to leverage from both labeled and unlabeled data. In this paper, we focus on cases where the number of labeled samples is very limited. We review and formalize eight semi-supervised learning algorithms and introduce a new method that combine supervised and unsupervised learning in order to use both labeled and unlabeled data. The main idea of this method is to produce new features derived from a first step of data clustering. These features are then used to enrich the description of the input data leading to a better use of the data distribution. The efficiency of all the methods is compared on various artificial, UCI datasets, and on the classification of a very high resolution remote sensing image. The experiments reveal that our method shows good results, especially when the number of labeled sample is very limited. It also confirms that combining labeled and unlabeled data is very useful in pattern recognition.
Fichier principal
Vignette du fichier
is2016.pdf (1.01 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01355189 , version 1 (22-08-2016)
hal-01355189 , version 2 (20-09-2018)

Identifiants

Citer

Germain Forestier, Cédric Wemmert. Semi-supervised learning using multiple clusterings with limited labeled data. Information Sciences, 2016, 361–362, pp.48-65. ⟨10.1016/j.ins.2016.04.040⟩. ⟨hal-01355189v2⟩
130 Consultations
1618 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More