Self supervised learning for automatic text summarization by text span extraction

Massih-Reza Amini Patrick Gallinari 1
1 APA - Apprentissage et Acquisition des connaissances
LIP6 - Laboratoire d'Informatique de Paris 6
Abstract : We describe a system for automatic text summarization that operates by extracting the most relevant sentences from documents with regard to a query. The lack of labeled corpora makes it difficult to develop automatic techniques for summarization. We propose to use a self-supervised method which does not rely on the availability of labeled corpora for learning to rank sentences for the summary. The method operates in two steps: first a statistical similarity based system which does not require any training is developed, second a classifier is trained using self-supervised learning in order to improve this baseline method. This idea is evaluated on the Reuters news-wire corpus and compared to other strategies.
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01571863
Contributor : Lip6 Publications <>
Submitted on : Thursday, August 3, 2017 - 5:28:06 PM
Last modification on : Thursday, March 21, 2019 - 1:09:54 PM

Identifiers

  • HAL Id : hal-01571863, version 1

Citation

Massih-Reza Amini, Patrick Gallinari. Self supervised learning for automatic text summarization by text span extraction. The 23rd BCS European Annual Colloquium on Information Retrieval (ECIR'01), 2001, Darmstadt, Germany. pp.55-63. ⟨hal-01571863⟩

Share

Metrics

Record views

355