Skip to Main content Skip to Navigation
New interface
Conference papers

Semi-Supervised Document Classification with a Mislabeling Error Model

Anastasia Krithara 1 Massih-Reza Amini 1 Cyril Goutte 
1 MALIRE - Machine Learning and Information Retrieval
LIP6 - Laboratoire d'Informatique de Paris 6
Abstract : This paper investigates a new extension of the Probabilistic Latent Semantic Analysis (PLSA) model [6] for text classification where the training set is partially labeled. The proposed approach iteratively labels the unlabeled documents and estimates the probabilities of its labeling errors. These probabilities are then taken into account in the estimation of the new model parameters before the next round. Our approach outperforms an earlier semi-supervised extension of PLSA introduced by [9] which is based on the use of fake labels. However, it maintains its simplicity and ability to solve multiclass problems. In addition, it gives valuable information about the most uncertain and difficult classes to label. We perform experiments over the 20Newsgroups, WebKB and Reuters document collections and show the effectiveness of our approach over two other semi-supervised algorithms applied to these text classification problems.
Document type :
Conference papers
Complete list of metadata
Contributor : Lip6 Publications Connect in order to contact the contributor
Submitted on : Tuesday, April 12, 2016 - 2:18:04 PM
Last modification on : Sunday, June 26, 2022 - 9:53:35 AM

Links full text



Anastasia Krithara, Massih-Reza Amini, Cyril Goutte. Semi-Supervised Document Classification with a Mislabeling Error Model. European Conference on Information Retrieval (ECIR'08), Mar 2008, Glasgow, United Kingdom. pp.370-381, ⟨10.1007/978-3-540-78646-7_34⟩. ⟨hal-01301551⟩



Record views