Automatic Text Summarization based on Word-Clusters and Ranking Algorithms

Massih-Reza Amini; Nicolas Usunier; Patrick Gallinari

doi:10.1007/978-3-540-31865-1_11

Communication Dans Un Congrès Année : 2005

Automatic Text Summarization based on Word-Clusters and Ranking Algorithms

(1) , (1) , (1)

Massih-Reza Amini

Fonction : Auteur
PersonId : 747054
IdHAL : massih-reza-amini
ORCID : 0000-0001-9032-4233
IdRef : 132277042

Machine Learning and Information Retrieval

Nicolas Usunier

Fonction : Auteur
PersonId : 933831

Machine Learning and Information Retrieval

Patrick Gallinari

Fonction : Auteur
PersonId : 751615
IdHAL : patrick-gallinari
ORCID : 0000-0001-9060-9001
IdRef : 070709076

Machine Learning and Information Retrieval

Résumé

This paper investigates a new approach for Single Document Summarization based on a Machine Learning ranking algorithm. The use of machine learning techniques for this task allows one to adapt summaries to the user needs and to the corpus characteristics. These desirable properties have motivated an increasing amount of work in this field over the last few years. Most approaches attempt to generate summaries by extracting text-spans (sentences in our case) and adopt the classification framework which consists to train a classifier in order to discriminate between relevant and irrelevant spans of a document. A set of features is first used to produce a vector of scores for each sentence in a given document and a classifier is trained in order to make a global combination of these scores. We believe that the classification criterion for training a classifier is not adapted for SDS and propose an original framework based on ranking for this task. A ranking algorithm also combines the scores of different features but its criterion tends to reduce the relative misordering of sentences within a document. Features we use here are either based on the state-of-the-art or built upon word-clusters. These clusters are groups of words which often co-occur with each other, and can serve to expand a query or to enrich the representation of the sentences of the documents. We analyze the performance of our ranking algorithm on two data sets – the Computation and Language (cmp_lg) collection of TIPSTER SUMMAC and the WIPO collection. We perform comparisons with different baseline – non learning – systems, and a reference trainable summarizer system based on the classification framework. The experiments show that the learning algorithms perform better than the non-learning systems while the ranking algorithm outperforms the classifier. The difference of performance between the two learning algorithms depends on the nature of datasets. We give an explanation of this fact by the different separability hypothesis of the data made by the two learning algorithms.

Domaines

Informatique [cs]

Lip6 Publications : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01416548

Soumis le : mercredi 14 décembre 2016-15:54:51

Dernière modification le : jeudi 14 mars 2024-14:40:45

Dates et versions

hal-01416548 , version 1 (14-12-2016)

Identifiants

HAL Id : hal-01416548 , version 1
DOI : 10.1007/978-3-540-31865-1_11

Citer

Massih-Reza Amini, Nicolas Usunier, Patrick Gallinari. Automatic Text Summarization based on Word-Clusters and Ranking Algorithms. ECIR 2005 - 27th European Conference on Information Retrieval, Mar 2005, Santiago de Compostela, Spain. pp.142-156, ⟨10.1007/978-3-540-31865-1_11⟩. ⟨hal-01416548⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UPMC CNRS LIP6 SORBONNE-UNIVERSITE SU-SCIENCES

76 Consultations

0 Téléchargements

Automatic Text Summarization based on Word-Clusters and Ranking Algorithms

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager