Capitalizing on a TREC Track to Build a Tweet Summarization Dataset - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès CIRCLE 2020 Année : 2020

Capitalizing on a TREC Track to Build a Tweet Summarization Dataset

Résumé

Today there is a lack of standard collection for automatic tweet summarization evaluation. The construction of such a large dataset is very tedious. In this paper, we check whether the dataset proposed for the TREC Incident Streams track, which was not created for automatic summary generation, could be used in this way. Indeed, when filtering the TREC Incident Streams (IS) dataset with the assessors' annotations, it appears to respect the citeria identified in the literature related to automatic summarization. For this, we studied the TREC IS dataset and then proposed a subset summarizing each event, based on the assessors' annotations. This subset is evaluated according to the criteria previously mentioned. Several widely used state-of-the-art models for automatic text summarization, adapted to tweet summarization, were finally tested on the proposed dataset. The code, the annotations and the results are provided on our Github.
Fichier principal
Vignette du fichier
CIRCLE20_20.pdf (1.07 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03095613 , version 1 (04-01-2021)

Licence

Paternité - Pas d'utilisation commerciale - Pas de modification

Identifiants

  • HAL Id : hal-03095613 , version 1

Citer

Alexis Dusart, Karen Pinel-Sauvagnat, Gilles Hubert. Capitalizing on a TREC Track to Build a Tweet Summarization Dataset. Joint Conference of the Information Retrieval Communities in Europe (CIRCLE 2020), Université de Toulouse, France, Jul 2020, Samatan, Gers, France. pp.1-9. ⟨hal-03095613⟩
49 Consultations
31 Téléchargements

Partager

Gmail Facebook X LinkedIn More