The Use of Unlabeled data to Improve Supervised Learning for Text Summarization

Massih-Reza Amini; Patrick Gallinari

doi:10.1145/564376.564397

Communication Dans Un Congrès Année : 2002

The Use of Unlabeled data to Improve Supervised Learning for Text Summarization

(1) , (1)

Massih-Reza Amini

Fonction : Auteur
PersonId : 747054
IdHAL : massih-reza-amini
ORCID : 0000-0001-9032-4233
IdRef : 132277042

Apprentissage et Acquisition des connaissances

Patrick Gallinari

Fonction : Auteur
PersonId : 751615
IdHAL : patrick-gallinari
ORCID : 0000-0001-9060-9001
IdRef : 070709076

Apprentissage et Acquisition des connaissances

Résumé

With the huge amount of information available electronically, there is an increasing demand for automatic text summarization systems. The use of machine learning techniques for this task allows one to adapt summaries to the user needs and to the corpus characteristics. These desirable properties have motivated an increasing amount of work in this field over the last few years. Most approaches attempt to generate summaries by extracting sentence segments and adopt the supervised learning paradigm which requires to label documents at the text span level. This is a costly process, which puts strong limitations on the applicability of these methods. We investigate here the use of semi-supervised algorithms for summarization. These techniques make use of few labeled data together with a larger amount of unlabeled data. We propose new semi-supervised algorithms for training classification models for text summarization. We analyze their performances on two data sets - the Reuters news-wire corpus and the Computation and Language (cmp_lg) collection of TIPSTER SUMMAC. We perform comparisons with a baseline - non learning - system, and a reference trainable summarizer system.

Domaines

Informatique [cs]

Lip6 Publications : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01561460

Soumis le : mercredi 12 juillet 2017-17:38:55

Dernière modification le : jeudi 14 mars 2024-14:42:50

Dates et versions

hal-01561460 , version 1 (12-07-2017)

Identifiants

HAL Id : hal-01561460 , version 1
DOI : 10.1145/564376.564397

Citer

Massih-Reza Amini, Patrick Gallinari. The Use of Unlabeled data to Improve Supervised Learning for Text Summarization. SIGIR 2002 - 25th International ACM SIGIR Conference on Research and Development in Information Retrieval, Aug 2002, Tampere, Finland. pp.105-112, ⟨10.1145/564376.564397⟩. ⟨hal-01561460⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UPMC CNRS LIP6 SORBONNE-UNIVERSITE SU-SCIENCES

43 Consultations

0 Téléchargements

The Use of Unlabeled data to Improve Supervised Learning for Text Summarization

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager