Word Recognition, Competition, and Activation in a Model of Visually Grounded Speech - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2019

Word Recognition, Competition, and Activation in a Model of Visually Grounded Speech

Résumé

In this paper, we study how word-like units are represented and activated in a recurrent neu-ral model of visually grounded speech. The model used in our experiments is trained to project an image and its spoken description in a common representation space. We show that a recurrent model trained on spoken sentences implicitly segments its input into word-like units and reliably maps them to their correct visual referents. We introduce a methodology originating from linguistics to analyse the representation learned by neural networks-the gating paradigm-and show that the correct representation of a word is only activated if the network has access to first phoneme of the target word, suggesting that the network does not rely on a global acoustic pattern. Furthermore, we find out that not all speech frames (MFCC vectors in our case) play an equal role in the final encoded representation of a given word, but that some frames have a crucial effect on it. Finally, we suggest that word representation could be activated through a process of lexical competition.
Fichier principal
Vignette du fichier
CoNLL.pdf (2.58 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02359540 , version 1 (12-11-2019)

Identifiants

Citer

William N Havard, Jean-Pierre Chevrot, Laurent Besacier. Word Recognition, Competition, and Activation in a Model of Visually Grounded Speech. Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), Nov 2019, Hong Kong, China. pp.339-348, ⟨10.18653/v1/K19-1032⟩. ⟨hal-02359540⟩
73 Consultations
79 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More