A Probabilistic Model for Joint Learning of Word Embeddings from Texts and Images

Melissa Ailem; Bowen Zhang; Aurélien Bellet; Pascal Denis; Fei Sha

Communication Dans Un Congrès Année : 2018

A Probabilistic Model for Joint Learning of Word Embeddings from Texts and Images

(1, 2) , (1) , (2) , (2) , (1)

1
2

Melissa Ailem

Fonction : Auteur

USC Viterbi School of Engineering

Machine Learning in Information Networks

Bowen Zhang

Fonction : Auteur

USC Viterbi School of Engineering

Aurélien Bellet

Fonction : Auteur
PersonId : 9877
IdHAL : aurelien-bellet
ORCID : 0000-0003-3440-1251
IdRef : 17653136X

Machine Learning in Information Networks

Pascal Denis

Fonction : Auteur
PersonId : 1744
IdHAL : pascal-denis
IdRef : 031934684

Machine Learning in Information Networks

Fei Sha

Fonction : Auteur

USC Viterbi School of Engineering

Résumé

Several recent studies have shown the benefits of combining language and perception to infer word embeddings. These multimodal approaches either simply combine pre-trained textual and visual representations (e.g. features extracted from convolutional neural networks), or use the latter to bias the learning of textual word embeddings. In this work, we propose a novel probabilistic model to formalize how linguistic and perceptual inputs can work in concert to explain the observed word-context pairs in a text corpus. Our approach learns textual and visual representations jointly: latent visual factors couple together a skip-gram model for co-occurrence in linguistic data and a generative latent variable model for visual data. Extensive experimental studies validate the proposed model. Concretely, on the tasks of assessing pairwise word similarity and image/caption retrieval, our approach attains equally competitive or stronger results when compared to other state-of-the-art multimodal models.

Domaines

Apprentissage [cs.LG] Machine Learning [stat.ML]

Fichier principal

emnlp18.pdf (372.59 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Aurélien Bellet : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01922985

Soumis le : mercredi 14 novembre 2018-19:08:22

Dernière modification le : mercredi 24 janvier 2024-09:54:24

Archivage à long terme le : vendredi 15 février 2019-16:40:00

Dates et versions

hal-01922985 , version 1 (14-11-2018)

Identifiants

HAL Id : hal-01922985 , version 1

Citer

Melissa Ailem, Bowen Zhang, Aurélien Bellet, Pascal Denis, Fei Sha. A Probabilistic Model for Joint Learning of Word Embeddings from Texts and Images. Conference on Empirical Methods in Natural Language Processing (EMNLP 2018), 2018, Brussels, Belgium. ⟨hal-01922985⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA INRIA-SILICONVALLEY CRISTAL INRIA2 CRISTAL-MAGNET UNIV-LILLE

110 Consultations

199 Téléchargements

A Probabilistic Model for Joint Learning of Word Embeddings from Texts and Images

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager