Capitalizing on a TREC Track to Build a Tweet Summarization Dataset

Alexis Dusart; Karen Pinel-Sauvagnat; Gilles Hubert

Communication Dans Un Congrès CIRCLE 2020 Année : 2020

Capitalizing on a TREC Track to Build a Tweet Summarization Dataset

(1) , (1) , (1)

Alexis Dusart

Fonction : Auteur
PersonId : 1246793
ORCID : 0000-0001-8859-0313
IdRef : 268601445

Recherche d’Information et Synthèse d’Information

Karen Pinel-Sauvagnat

Fonction : Auteur
PersonId : 21007
IdHAL : karen-pinel-sauvagnat
ORCID : 0000-0003-3414-3803
IdRef : 095059245

Recherche d’Information et Synthèse d’Information

Gilles Hubert

Fonction : Auteur
PersonId : 737483
IdHAL : ghubert
ORCID : 0000-0003-3494-7561
IdRef : 031979890

Recherche d’Information et Synthèse d’Information

Résumé

Today there is a lack of standard collection for automatic tweet summarization evaluation. The construction of such a large dataset is very tedious. In this paper, we check whether the dataset proposed for the TREC Incident Streams track, which was not created for automatic summary generation, could be used in this way. Indeed, when filtering the TREC Incident Streams (IS) dataset with the assessors' annotations, it appears to respect the citeria identified in the literature related to automatic summarization. For this, we studied the TREC IS dataset and then proposed a subset summarizing each event, based on the assessors' annotations. This subset is evaluated according to the criteria previously mentioned. Several widely used state-of-the-art models for automatic text summarization, adapted to tweet summarization, were finally tested on the proposed dataset. The code, the annotations and the results are provided on our Github.

Domaines

Recherche d'information [cs.IR] Réseaux sociaux et d'information [cs.SI]

Fichier principal

CIRCLE20_20.pdf (1.07 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Alexis Dusart : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03095613

Soumis le : lundi 4 janvier 2021-16:46:41

Dernière modification le : mardi 16 janvier 2024-16:19:54

Archivage à long terme le : lundi 5 avril 2021-21:14:03

Dates et versions

hal-03095613 , version 1 (04-01-2021)

Licence

Paternité - Pas d'utilisation commerciale - Pas de modification

Identifiants

HAL Id : hal-03095613 , version 1

Citer

Alexis Dusart, Karen Pinel-Sauvagnat, Gilles Hubert. Capitalizing on a TREC Track to Build a Tweet Summarization Dataset. Joint Conference of the Information Retrieval Communities in Europe (CIRCLE 2020), Université de Toulouse, France, Jul 2020, Samatan, Gers, France. pp.1-9. ⟨hal-03095613⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-TLSE2 CNRS SMS UT1-CAPITOLE IRIT IRIT-IRIS IRIT-GD IRIT-UT3 TOULOUSE-INP UNIV-UT3 UT3-TOULOUSEINP

49 Consultations

31 Téléchargements

Capitalizing on a TREC Track to Build a Tweet Summarization Dataset

Résumé

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Partager