Bidirectional Joint Representation Learning with Symmetrical Deep Neural Networks for Multimodal and Crossmodal Applications

Vedran Vukotić; Christian Raymond; Guillaume Gravier

Communication Dans Un Congrès Année : 2016

Bidirectional Joint Representation Learning with Symmetrical Deep Neural Networks for Multimodal and Crossmodal Applications

(1) , (1) ,

Vedran Vukotić

Fonction : Auteur
PersonId : 8581
IdHAL : vvukotic

Creating and exploiting explicit links between multimedia fragments

Christian Raymond

Fonction : Auteur
PersonId : 1778
IdHAL : christian-raymond
IdRef : 099236486

Creating and exploiting explicit links between multimedia fragments

Guillaume Gravier

Fonction : Auteur
PersonId : 1046
IdHAL : guig
ORCID : 0000-0002-2266-5682
IdRef : 110355415

Résumé

Common approaches to problems involving multiple modalities (classification, retrieval, hyperlinking, etc.) are early fusion of the initial modalities and crossmodal translation from one modality to the other. Recently, deep neural networks, especially deep autoencoders, have proven promising both for crossmodal translation and for early fusion via multimodal embedding. In this work, we propose a flexible cross-modal deep neural network architecture for multimodal and crossmodal representation. By tying the weights of two deep neural networks, symmetry is enforced in central hidden layers thus yielding a multimodal representation space common to the two original representation spaces. The proposed architecture is evaluated in multimodal query expansion and multimodal retrieval tasks within the context of video hyperlinking. Our method demonstrates improved crossmodal translation capabilities and produces a multimodal embedding that significantly outperforms multimodal embeddings obtained by deep autoencoders, resulting in an absolute increase of 14.14 in precision at 10 on a video hyperlinking task.

Mots clés

deep learning representation embedding crossmodal video hyperlinking video retrieval retrieval shared weights multimodal tied weights image and text autoencoder bidirectional learning neural networks

Domaines

Multimédia [cs.MM] Intelligence artificielle [cs.AI]

Fichier principal

vukotic_BiDNN.pdf (472.57 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Vedran Vukotić : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01314302

Soumis le : mercredi 11 mai 2016-10:41:38

Dernière modification le : vendredi 24 mars 2023-14:53:02

Archivage à long terme le : mercredi 16 novembre 2016-00:30:47

Dates et versions

hal-01314302 , version 1 (11-05-2016)

Identifiants

HAL Id : hal-01314302 , version 1

Citer

Vedran Vukotić, Christian Raymond, Guillaume Gravier. Bidirectional Joint Representation Learning with Symmetrical Deep Neural Networks for Multimodal and Crossmodal Applications. ICMR, ACM, Jun 2016, New York, United States. ⟨hal-01314302⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA IRISA-INSA-R CENTRALESUPELEC IRISA-D6 INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

914 Consultations

1624 Téléchargements

Bidirectional Joint Representation Learning with Symmetrical Deep Neural Networks for Multimodal and Crossmodal Applications

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager