Combining Vision and Language Representations for Patch-based Identification of Lexico-Semantic Relations

Prince Jha; Gaël Dias; Alexis Lechervy; Jose G. Moreno; Anubhav Jangra; Sebastião Pais; Sriparna Saha

doi:10.1145/3503161.3548299

Communication Dans Un Congrès Année : 2022

Combining Vision and Language Representations for Patch-based Identification of Lexico-Semantic Relations

(1) , (2) , (2) , (3, 4) , (5) , (6) , (1)

1
2
3
4
5
6

Prince Jha

Fonction : Auteur
PersonId : 1149219

Indian Institute of Technology Patna

Gaël Dias

Fonction : Auteur
PersonId : 3735
IdHAL : gael-dias
ORCID : 0000-0002-5840-1603
IdRef : 113779747

Equipe Image - Laboratoire GREYC - UMR6072

Alexis Lechervy

Fonction : Auteur
PersonId : 7323
IdHAL : alexis-lechervy
ORCID : 0000-0002-9441-0187
IdRef : 16680746X

Equipe Image - Laboratoire GREYC - UMR6072

Jose G. Moreno

Fonction : Auteur
PersonId : 743396
IdHAL : jose-g-moreno
ORCID : 0000-0002-8852-5797
IdRef : 190544007

Recherche d’Information et Synthèse d’Information

Université Toulouse III - Paul Sabatier

Anubhav Jangra

Fonction : Auteur
PersonId : 1149220

Google Research India

Sebastião Pais

Fonction : Auteur
PersonId : 6973
IdHAL : sebastiao-pais
ORCID : 0000-0003-2337-0779
IdRef : 177086181

University of Beira Interior [Portugal]

Sriparna Saha

Fonction : Auteur
PersonId : 961026

Indian Institute of Technology Patna

Résumé

Although a wide range of applications have been proposed in the field of multimodal natural language processing, very few works have been tackling multimodal relational lexical semantics. In this paper, we propose the first attempt to identify lexico-semantic relations with visual clues, which embody linguistic phenomena such as synonymy, co-hyponymy or hypernymy. While traditional methods take advantage of the paradigmatic approach or/and the distributional hypothesis, we hypothesize that visual information can supplement the textual information, relying on the apperceptum subcomponent of the semiotic textology linguistic theory. For that purpose, we automatically extend two gold-standard datasets with visual information, and develop different fusion techniques to combine textual and visual modalities following the patch-based strategy. Experimental results over the multimodal datasets show that the visual information can supplement the missing semantics of textual encodings with reliable performance improvements.

Domaines

Traitement du texte et du document Interface homme-machine [cs.HC] Recherche d'information [cs.IR] Apprentissage [cs.LG] Web

Gaël Dias : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03720737

Soumis le : mardi 12 juillet 2022-10:53:39

Dernière modification le : mercredi 20 mars 2024-16:20:04

Dates et versions

hal-03720737 , version 1 (12-07-2022)

Identifiants

HAL Id : hal-03720737 , version 1
DOI : 10.1145/3503161.3548299

Citer

Prince Jha, Gaël Dias, Alexis Lechervy, Jose G. Moreno, Anubhav Jangra, et al.. Combining Vision and Language Representations for Patch-based Identification of Lexico-Semantic Relations. 30th ACM International Conference on Multimedia (ACM MM 2022), Oct 2022, Lisbonne, Portugal. pp.4406-4415, ⟨10.1145/3503161.3548299⟩. ⟨hal-03720737⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-TLSE2 CNRS GREYC GREYC-IMAGE UT1-CAPITOLE COMUE-NORMANDIE ENSICAEN UNICAEN IRIT IRIT-IRIS IRIT-GD TOULOUSE-INP UNIV-UT3 UT3-TOULOUSEINP

105 Consultations

0 Téléchargements

Combining Vision and Language Representations for Patch-based Identification of Lexico-Semantic Relations

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager