HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation

Image search using multilingual texts: a cross-modal learning approach between image and text

Abstract : Multilingual (or cross-lingual) embeddings represent several languages in a unique vector space. Using a common embedding space enables for a shared semantic between words from different languages. In this paper, we propose to embed images and texts into a unique distributional vector space, enabling to search images by using text queries expressing information needs related to the (visual) content of images, as well as using image similarity. Our framework forces the representation of an image to be similar to the representation of the text that describes it. Moreover, by using multilingual embeddings we ensure that words from two different languages have close descriptors and thus are attached to similar images. We provide experimental evidence of the efficiency of our approach by experimenting it on two datasets: Common Objects in COntext (COCO) [19] and Multi30K [7].
Complete list of metadata

Contributor : Maxime Portaz Connect in order to contact the contributor
Submitted on : Monday, March 25, 2019 - 2:34:00 PM
Last modification on : Monday, February 21, 2022 - 3:38:09 PM
Long-term archiving on: : Wednesday, June 26, 2019 - 12:37:30 PM


Files produced by the author(s)


  • HAL Id : hal-02077556, version 1
  • ARXIV : 1903.11299


Maxime Portaz, Hicham Randrianarivo, Adrien Nivaggioli, Estelle Maudet, Christophe Servan, et al.. Image search using multilingual texts: a cross-modal learning approach between image and text. [Research Report] qwant research. 2019. ⟨hal-02077556⟩



Record views


Files downloads