Skip to Main content Skip to Navigation
Conference papers

Detecting unseen visual relations using analogies

Julia Peyre 1, 2 Ivan Laptev 1, 2 Cordelia Schmid 3 Josef Sivic 1, 2, 4
1 WILLOW - Models of visual object recognition and scene understanding
Inria de Paris, DI-ENS - Département d'informatique de l'École normale supérieure
3 Thoth - Apprentissage de modèles à partir de données massives
LJK - Laboratoire Jean Kuntzmann, Inria Grenoble - Rhône-Alpes
Abstract : We seek to detect visual relations in images of the form of triplets t = (subject, predicate, object), such as “person riding dog”, where training examples of the individual entities are available but their combinations are unseen at training. This is an important set-up due to the combinatorial nature of visual relations: collecting sufficient training data for all possible triplets would be very hard. The contributions of this work are three-fold. First, we learn a representation of visual relations that combines (i) individual embeddings for subject, object and predicate together with (ii) a visual phrase embedding that represents the relation triplet. Second, we learn how to transfer visual phrase embeddings from existing training triplets to unseen test triplets using analogies between relations that involve similar objects. Third, we demonstrate the benefits of our approach on three challenging datasets: on HICO-DET, our model achieves significant improvement over a strong baseline for both frequent and unseen triplets, and we observe similar improvement for the retrieval of unseen triplets with out-of-vocabulary predicates on the COCO-a dataset as well as the challenging unusual triplets in the UnRel dataset.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01975760
Contributor : Julia Peyre <>
Submitted on : Wednesday, January 9, 2019 - 3:21:02 PM
Last modification on : Monday, April 20, 2020 - 9:19:45 AM

Links full text

Identifiers

Collections

Citation

Julia Peyre, Ivan Laptev, Cordelia Schmid, Josef Sivic. Detecting unseen visual relations using analogies. ICCV 2019 - International Conference on Computer Vision, Oct 2019, Seoul, South Korea. pp.1981-1990, ⟨10.1109/ICCV.2019.00207⟩. ⟨hal-01975760⟩

Share

Metrics

Record views

458