Detecting rare visual relations using analogies

Julia Peyre 1, 2 Ivan Laptev 1, 2 Cordelia Schmid 3 Josef Sivic 1, 2, 4
1 WILLOW - Models of visual object recognition and scene understanding
DI-ENS - Département d'informatique de l'École normale supérieure, Inria de Paris
3 Thoth - Apprentissage de modèles à partir de données massives
LJK - Laboratoire Jean Kuntzmann, Inria Grenoble - Rhône-Alpes
Abstract : We seek to detect visual relations in images of the form of triplets t = (subject, predicate, object), such as "person riding dog", where training examples of the individual entities are available but their combinations are rare or unseen at training. This is an important set-up due to the combinatorial nature of visual relations : collecting sufficient training data for all possible triplets would be very hard. The contributions of this work are three-fold. First, we learn a representation of visual relations that combines (i) individual embeddings for subject, object and predicate together with (ii) a visual phrase embedding that represents the relation triplet. Second, we learn how to transfer visual phrase embeddings from existing training triplets to unseen test triplets using analogies between relations that involve similar objects. Third, we demonstrate the benefits of our approach on two challenging datasets involving rare and unseen relations : on HICO-DET, our model achieves significant improvement over a strong baseline, and we confirm this improvement on retrieval of unseen triplets on the UnRel rare relation dataset.
Liste complète des métadonnées
Contributeur : Julia Peyre <>
Soumis le : mercredi 9 janvier 2019 - 15:21:02
Dernière modification le : mardi 29 janvier 2019 - 15:05:42

Lien texte intégral


  • HAL Id : hal-01975760, version 1
  • ARXIV : 1812.05736



Julia Peyre, Ivan Laptev, Cordelia Schmid, Josef Sivic. Detecting rare visual relations using analogies. 2019. 〈hal-01975760〉



Consultations de la notice