Skip to Main content Skip to Navigation
New interface
Conference papers

Tracking news stories in short messages in the era of infodemic

Abstract : Tracking news stories in documents is a way to deal with the large amount of information that surrounds us everyday, to reduce the noise and to detect emergent topics in news. Since the Covid-19 outbreak, the world has known a new problem: infodemic. News article titles are massively shared on social networks and the analysis of trends and growing topics is complex. Grouping documents in news stories lowers the number of topics to analyse and the information to ingest and/or evaluate. Our study proposes to analyse news tracking with little information provided by titles on social networks. In this paper, we take advantage of datasets of public news article titles to experiment news tracking algorithms on short messages. We evaluate the clustering performance with little amount of data per document. We deal with the document representation (sparse with TF-IDF and dense using Transformers [26]), its impact on the results and why it is key to this type of work. We used a supervised algorithm proposed by Miranda et al. [22] and K-Means to provide evaluations for different use cases. We found that TF-IDF vectors are not always the best ones to group documents, and that algorithms are sensitive to the type of representation. Knowing this, we recommend taking both aspects into account while tracking news stories in short messages. With this paper, we share all the source code and resources we handled.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03727200
Contributor : Guillaume Bernard Connect in order to contact the contributor
Submitted on : Tuesday, July 19, 2022 - 9:56:47 AM
Last modification on : Monday, September 5, 2022 - 2:31:58 PM
Long-term archiving on: : Thursday, October 20, 2022 - 6:18:31 PM

File

main.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

Collections

Citation

Guillaume Bernard, Cyrille Suire, Cyril Faucher, Antoine Doucet, Paolo Rosso. Tracking news stories in short messages in the era of infodemic. Conference and Labs of the Evaluation Forum (CLEF 2022), Università di Bologna, Italy, Sep 2022, Bologne, Italy. pp.18-32, ⟨10.1007/978-3-031-13643-6_2⟩. ⟨hal-03727200⟩

Share

Metrics

Record views

15

Files downloads

5