Fusion of Multimodal Embeddings for Ad-Hoc Video Search - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2019

Fusion of Multimodal Embeddings for Ad-Hoc Video Search

Résumé

The challenge of Ad-Hoc Video Search (AVS) originates from free-form (i.e., no pre-defined vocabulary) and freestyle (i.e., natural language) query description. Bridging the semantic gap between AVS queries and videos becomes highly difficult as evidenced from the low retrieval accuracy of AVS benchmarking in TRECVID. In this paper, we study a new method to fuse multimodal embeddings which have been derived based on completely disjoint datasets. This method is tested on two datasets for two distinct tasks: on MSR-VTT for unique video retrieval and on V3C1 for multiple videos retrieval.
Fichier principal
Vignette du fichier
publi-6052.pdf (391.89 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03555283 , version 1 (03-02-2022)

Identifiants

Citer

Danny Francis, Phuong Anh Nguyen, Benoit Huet, Chong-Wah Ngo. Fusion of Multimodal Embeddings for Ad-Hoc Video Search. 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Oct 2019, Seoul, South Korea. pp.1868-1872, ⟨10.1109/ICCVW.2019.00233⟩. ⟨hal-03555283⟩

Collections

EURECOM ANR
26 Consultations
11 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More