The Use of Unlabeled data to Improve Supervised Learning for Text Summarization

Massih-Reza Amini Patrick Gallinari 1
1 APA - Apprentissage et Acquisition des connaissances
LIP6 - Laboratoire d'Informatique de Paris 6
Abstract : With the huge amount of information available electronically, there is an increasing demand for automatic text summarization systems. The use of machine learning techniques for this task allows one to adapt summaries to the user needs and to the corpus characteristics. These desirable properties have motivated an increasing amount of work in this field over the last few years. Most approaches attempt to generate summaries by extracting sentence segments and adopt the supervised learning paradigm which requires to label documents at the text span level. This is a costly process, which puts strong limitations on the applicability of these methods. We investigate here the use of semi-supervised algorithms for summarization. These techniques make use of few labeled data together with a larger amount of unlabeled data. We propose new semi-supervised algorithms for training classification models for text summarization. We analyze their performances on two data sets - the Reuters news-wire corpus and the Computation and Language (cmp_lg) collection of TIPSTER SUMMAC. We perform comparisons with a baseline - non learning - system, and a reference trainable summarizer system.
Document type :
Conference papers
Complete list of metadatas
Contributor : Lip6 Publications <>
Submitted on : Wednesday, July 12, 2017 - 5:38:55 PM
Last modification on : Thursday, March 21, 2019 - 1:03:52 PM



Massih-Reza Amini, Patrick Gallinari. The Use of Unlabeled data to Improve Supervised Learning for Text Summarization. SIGIR 2002 - 25th International ACM SIGIR Conference on Research and Development in Information Retrieval, Aug 2002, Tampere, Finland. pp.105-112, ⟨10.1145/564376.564397⟩. ⟨hal-01561460⟩



Record views