Learning Aspect Models with Partially Labeled Data

Anastasia Krithara; Massih-Reza Amini; Cyril Goutte; Jean-Michel Renders

doi:10.1016/j.patrec.2010.09.004

Article Dans Une Revue Pattern Recognition Letters Année : 2011

Learning Aspect Models with Partially Labeled Data

, (1) , ,

Anastasia Krithara

Fonction : Auteur

Massih-Reza Amini

Fonction : Auteur
PersonId : 747054
IdHAL : massih-reza-amini
ORCID : 0000-0001-9032-4233
IdRef : 132277042

Machine Learning and Information Retrieval

Cyril Goutte

Fonction : Auteur

Jean-Michel Renders

Fonction : Auteur

Résumé

In this paper, we address the problem of learning aspect models with partially labeled data for the task of document categorization. The motivation of this work is to take advantage of the amount of available unlabeled data together with the set of labeled examples to learn latent models whose structure and underlying hypotheses take more accurately into accountthe document generation processm compared to other mixture-based generative models. We present one semi-supervised variant of the PLSA model. In our approach, we try to capture the possible data mislabeling errors which occur during the training of our model. This is done by iteratively assigning class labels to document collections, as well as over a real world dataset coming from a Business Group of Xerox and show the effectiveness of our approach compared to a semi-supervised version of Naive Bayes, another semi-supervised version of PLSA and to transductive Support Vector Machines.

Domaines

Informatique [cs]

Lip6 Publications : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01172498

Soumis le : mardi 7 juillet 2015-14:47:14

Dernière modification le : jeudi 14 mars 2024-14:40:45

Dates et versions

hal-01172498 , version 1 (07-07-2015)

Identifiants

HAL Id : hal-01172498 , version 1
DOI : 10.1016/j.patrec.2010.09.004

Citer

Anastasia Krithara, Massih-Reza Amini, Cyril Goutte, Jean-Michel Renders. Learning Aspect Models with Partially Labeled Data. Pattern Recognition Letters, 2011, 32 (2), pp.297-304. ⟨10.1016/j.patrec.2010.09.004⟩. ⟨hal-01172498⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UPMC CNRS LIP6 SORBONNE-UNIVERSITE SU-SCIENCES

50 Consultations

0 Téléchargements

Learning Aspect Models with Partially Labeled Data

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager