Skip to Main content Skip to Navigation
Conference papers

Hierarchical Multimodal Attention for Deep Video Summarization

Melissa Sanabria 1, 2, 3 Frédéric Precioso 1, 2, 3 Thomas Menguy 4
2 MAASAI - Modèles et algorithmes pour l’intelligence artificielle
CRISAM - Inria Sophia Antipolis - Méditerranée , Laboratoire I3S - SPARKS - Scalable and Pervasive softwARe and Knowledge Systems, UNS - Université Nice Sophia Antipolis (... - 2019), JAD - Laboratoire Jean Alexandre Dieudonné
Abstract : The way people consume sports on TV has drastically evolved in the last years, particularly under the combined effects of the legalization of sport betting and the huge increase of sport analytics. Several companies are nowadays sending observers in the stadiums to collect live data of all the events happening on the field during the match. Those data contain meaningful information providing a very detailed description of all the actions occurring during the match to feed the coaches and staff, the fans, the viewers, and the gamblers. Exploiting all these data, sport broadcasters want to generate extra content such as match highlights, match summaries, players and teams analytics, etc., to appeal subscribers. This paper explores the problem of summarizing professional soccer matches as automatically as possible using both the aforementioned event-stream data collected from the field and the content broadcasted on TV. We have designed an architecture, introducing first (1) a Multiple Instance Learning method that takes into account the sequential dependency among events and then (2) a hierarchical multimodal attention layer that grasps the importance of each event in an action. We evaluate our approach on matches from two professional European soccer leagues, showing its capability to identify the best actions for automatic summarization by comparing with real summaries made by human operators.
Document type :
Conference papers
Complete list of metadata

Cited literature [52 references]  Display  Hide  Download
Contributor : Melissa Sanabria <>
Submitted on : Monday, October 12, 2020 - 11:36:37 AM
Last modification on : Friday, February 5, 2021 - 2:36:54 PM
Long-term archiving on: : Wednesday, January 13, 2021 - 6:40:28 PM


Files produced by the author(s)


  • HAL Id : hal-02964209, version 1



Melissa Sanabria, Frédéric Precioso, Thomas Menguy. Hierarchical Multimodal Attention for Deep Video Summarization. 25th International Conference on Pattern Recognition, Jan 2021, Milan, Italy. ⟨hal-02964209⟩



Record views


Files downloads