HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

IRISA at TRECVid 2017: Beyond Crossmodal and Multimodal Models for Video Hyperlinking

Abstract : This paper presents the runs that were submitted to the TRECVid Challenge 2017 for the Video Hyperlinking task. The goal of the task is to propose a list of video segments, called targets, to complement query video segments defined as anchors. The data provided with the task encourage participants to make use of multiple modalities such as the audio track and the keyframes. In this context, we submitted four runs: 1) BiDNNFull uses a BiDNN model to combine ResNet with Word2Vec; 2) BiDNNFilter makes use of the same model and also exploits the metadata to narrow down the list of possible candidates; 3) BiDNNPinv tries to improve on the anchor keyframe fusion by using the Moore-Penrose pseudo-inverse and finally 4) noBiDNNPinv tests on the relevance of not using a BiDNN to fuse the modalities. Our runs were built based on a pre-trained model of ResNet as well as the transcripts and the metadata provided by the organizers of the task. The results show a gain in performance over the baseline BiDNN model both when the metadata filter was used and when the keyframe fusion was done with a pseudo-inverse.
Complete list of metadata

Cited literature [14 references]  Display  Hide  Download

Contributor : Guillaume Gravier Connect in order to contact the contributor
Submitted on : Tuesday, November 21, 2017 - 11:26:36 AM
Last modification on : Friday, April 8, 2022 - 4:08:03 PM


Files produced by the author(s)


  • HAL Id : hal-01643232, version 1


Mikail Demirdelen, Mateusz Budnik, Gabriel Sargent, Rémi Bois, Guillaume Gravier. IRISA at TRECVid 2017: Beyond Crossmodal and Multimodal Models for Video Hyperlinking. Working Notes of the TRECVid 2017 Workshop, 2017, Gettysburg, United States. ⟨hal-01643232⟩



Record views


Files downloads