Combining Vision and Language Representations for Patch-based Identification of Lexico-Semantic Relations - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2022

Combining Vision and Language Representations for Patch-based Identification of Lexico-Semantic Relations

Résumé

Although a wide range of applications have been proposed in the field of multimodal natural language processing, very few works have been tackling multimodal relational lexical semantics. In this paper, we propose the first attempt to identify lexico-semantic relations with visual clues, which embody linguistic phenomena such as synonymy, co-hyponymy or hypernymy. While traditional methods take advantage of the paradigmatic approach or/and the distributional hypothesis, we hypothesize that visual information can supplement the textual information, relying on the apperceptum subcomponent of the semiotic textology linguistic theory. For that purpose, we automatically extend two gold-standard datasets with visual information, and develop different fusion techniques to combine textual and visual modalities following the patch-based strategy. Experimental results over the multimodal datasets show that the visual information can supplement the missing semantics of textual encodings with reliable performance improvements.
Fichier non déposé

Dates et versions

hal-03720737 , version 1 (12-07-2022)

Identifiants

Citer

Prince Jha, Gaël Dias, Alexis Lechervy, Jose G. Moreno, Anubhav Jangra, et al.. Combining Vision and Language Representations for Patch-based Identification of Lexico-Semantic Relations. 30th ACM International Conference on Multimedia (ACM MM 2022), Oct 2022, Lisbonne, Portugal. pp.4406-4415, ⟨10.1145/3503161.3548299⟩. ⟨hal-03720737⟩
105 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More