IRISA at TRECVid 2017: Beyond Crossmodal and Multimodal Models for Video Hyperlinking

Abstract : This paper presents the runs that were submitted to the TRECVid Challenge 2017 for the Video Hyperlinking task. The goal of the task is to propose a list of video segments, called targets, to complement query video segments defined as anchors. The data provided with the task encourage participants to make use of multiple modalities such as the audio track and the keyframes. In this context, we submitted four runs: 1) BiDNNFull uses a BiDNN model to combine ResNet with Word2Vec; 2) BiDNNFilter makes use of the same model and also exploits the metadata to narrow down the list of possible candidates; 3) BiDNNPinv tries to improve on the anchor keyframe fusion by using the Moore-Penrose pseudo-inverse and finally 4) noBiDNNPinv tests on the relevance of not using a BiDNN to fuse the modalities. Our runs were built based on a pre-trained model of ResNet as well as the transcripts and the metadata provided by the organizers of the task. The results show a gain in performance over the baseline BiDNN model both when the metadata filter was used and when the keyframe fusion was done with a pseudo-inverse.
Liste complète des métadonnées

Littérature citée [14 références]  Voir  Masquer  Télécharger
Contributeur : Guillaume Gravier <>
Soumis le : mardi 21 novembre 2017 - 11:26:36
Dernière modification le : jeudi 15 novembre 2018 - 11:59:01


Fichiers produits par l'(les) auteur(s)


  • HAL Id : hal-01643232, version 1


Mikail Demirdelen, Mateusz Budnik, Gabriel Sargent, Rémi Bois, Guillaume Gravier. IRISA at TRECVid 2017: Beyond Crossmodal and Multimodal Models for Video Hyperlinking. Working Notes of the TRECVid 2017 Workshop, 2017, Gettysburg, United States. 〈hal-01643232〉



Consultations de la notice


Téléchargements de fichiers