IRISA at TRECVid 2017: Beyond Crossmodal and Multimodal Models for Video Hyperlinking - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2017

IRISA at TRECVid 2017: Beyond Crossmodal and Multimodal Models for Video Hyperlinking

Résumé

This paper presents the runs that were submitted to the TRECVid Challenge 2017 for the Video Hyperlinking task. The goal of the task is to propose a list of video segments, called targets, to complement query video segments defined as anchors. The data provided with the task encourage participants to make use of multiple modalities such as the audio track and the keyframes. In this context, we submitted four runs: 1) BiDNNFull uses a BiDNN model to combine ResNet with Word2Vec; 2) BiDNNFilter makes use of the same model and also exploits the metadata to narrow down the list of possible candidates; 3) BiDNNPinv tries to improve on the anchor keyframe fusion by using the Moore-Penrose pseudo-inverse and finally 4) noBiDNNPinv tests on the relevance of not using a BiDNN to fuse the modalities. Our runs were built based on a pre-trained model of ResNet as well as the transcripts and the metadata provided by the organizers of the task. The results show a gain in performance over the baseline BiDNN model both when the metadata filter was used and when the keyframe fusion was done with a pseudo-inverse.
Fichier principal
Vignette du fichier
irisa-trecvid-lnk-2017.pdf (321.59 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01643232 , version 1 (21-11-2017)

Identifiants

  • HAL Id : hal-01643232 , version 1

Citer

Mikail Demirdelen, Mateusz Budnik, Gabriel Sargent, Rémi Bois, Guillaume Gravier. IRISA at TRECVid 2017: Beyond Crossmodal and Multimodal Models for Video Hyperlinking. Working Notes of the TRECVid 2017 Workshop, 2017, Gettysburg, United States. ⟨hal-01643232⟩
597 Consultations
127 Téléchargements

Partager

Gmail Facebook X LinkedIn More