Finding beans in burgers: Deep semantic-visual embedding with localization

Martin Engilberge; Louis Chevallier; Patrick Pérez; Matthieu Cord

doi:10.1109/CVPR.2018.00419

Communication Dans Un Congrès Année : 2018

Finding beans in burgers: Deep semantic-visual embedding with localization

(1, 2, 3) , (1) , (4) , (2, 4)

1
2
3
4

Martin Engilberge

Fonction : Auteur

Technicolor R & I [Cesson Sévigné]

Machine Learning and Information Access

InterDigital Communications

Louis Chevallier

Fonction : Auteur

Technicolor R & I [Cesson Sévigné]

Patrick Pérez

Fonction : Auteur
PersonId : 1022281

Valeo.ai

Matthieu Cord

Fonction : Auteur
PersonId : 13617
IdHAL : matthieucord
ORCID : 0000-0002-0627-5844
IdRef : 132968126

Machine Learning and Information Access

Valeo.ai

Résumé

Several works have proposed to learn a two-path neural network that maps images and texts, respectively, to a same shared Euclidean space where geometry captures useful semantic relationships. Such a multi-modal embedding can be trained and used for various tasks, notably image captioning. In the present work, we introduce a new architecture of this type, with a visual path that leverages recent space-aware pooling mechanisms. Combined with a textual path which is jointly trained from scratch, our semantic-visual embedding offers a versatile model. Once trained under the supervision of captioned images, it yields new state-of-the-art performance on cross-modal retrieval. It also allows the localization of new concepts from the embedding space into any input image, delivering state-of-the-art result on the visual grounding of phrases.

Domaines

Informatique [cs] Apprentissage [cs.LG] Vision par ordinateur et reconnaissance de formes [cs.CV] Informatique et langage [cs.CL] Réseau de neurones [cs.NE]

Fichier principal

findingbeansinburger.pdf (2.64 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Martin Engilberge : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02171857

Soumis le : mercredi 3 juillet 2019-11:25:21

Dernière modification le : samedi 7 octobre 2023-21:36:22

Dates et versions

hal-02171857 , version 1 (03-07-2019)

Identifiants

HAL Id : hal-02171857 , version 1
ARXIV : 1804.01720
DOI : 10.1109/CVPR.2018.00419

Citer

Martin Engilberge, Louis Chevallier, Patrick Pérez, Matthieu Cord. Finding beans in burgers: Deep semantic-visual embedding with localization. CVPR 2018 - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 2018, Salt Lake City, United States. pp.3984-3993, ⟨10.1109/CVPR.2018.00419⟩. ⟨hal-02171857⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS LIP6 SORBONNE-UNIVERSITE SU-SCIENCES ANR

56 Consultations

30 Téléchargements

Finding beans in burgers: Deep semantic-visual embedding with localization

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager