MODELS OF VISUALLY GROUNDED SPEECH SIGNAL PAY ATTENTION TO NOUNS: A BILINGUAL EXPERIMENT ON ENGLISH AND JAPANESE

Abstract : We investigate the behaviour of attention in neural models of visually grounded speech trained on two languages: English and Japanese. Experimental results show that attention focuses on nouns and this behaviour holds true for two very typologically different languages. We also draw parallels between artificial neural attention and human attention and show that neural attention focuses on word endings as it has been theorised for human attention. Finally, we investigate how two visually grounded monolingual models can be used to perform cross-lingual speech-to-speech retrieval. For both languages, the enriched bilingual (speech-image) corpora with part-of-speech tags and forced alignments are distributed to the community for reproducible research.
Document type :
Conference papers
Complete list of metadatas

Cited literature [25 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02193872
Contributor : Laurent Besacier <>
Submitted on : Wednesday, July 24, 2019 - 6:52:07 PM
Last modification on : Tuesday, July 30, 2019 - 3:42:15 PM

File

ICASSP2019_CAMERA-READY.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02193872, version 1

Collections

Citation

William Havard, Jean-Pierre Chevrot, Laurent Besacier. MODELS OF VISUALLY GROUNDED SPEECH SIGNAL PAY ATTENTION TO NOUNS: A BILINGUAL EXPERIMENT ON ENGLISH AND JAPANESE. IEEE ICASSP 2019, May 2019, Brighton, United Kingdom. ⟨hal-02193872⟩

Share

Metrics

Record views

16

Files downloads

17