From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions, Transactions of the Association for Computational Linguistics, vol.2, pp.67-78, 2014. ,
Microsoft coco: Common objects in context, Computer Vision-ECCV 2014, pp.740-755, 2014. ,
Deep multimodal semantic embeddings for speech and images, IEEE Automatic Speech Recognition and Understanding Workshop, pp.237-244, 2015. ,
Representations of language in a model of visually grounded speech signal, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.613-622, 2017. ,
Speech-coco: 600k visually grounded spoken captions aligned to mscoco data set, Proc. GLU 2017 International Workshop on Grounding Language Understanding, pp.42-46, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01580879
Stair captions: Constructing a large-scale japanese image caption dataset, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, vol.2, pp.417-421, 2017. ,
Encoding of phonology in a recurrent neural model of grounded speech, Proceedings of the 21st Conference on Computational Natural Language Learning, pp.368-378, 2017. ,
Vision as an interlingua: Learning multilingual semantic embeddings of untranscribed speech, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.4969-4973, 2018. ,
Jointly discovering visual objects and spoken words from raw sensory input, Computer Vision-ECCV 2018, pp.659-677, 2018. ,
Image pivoting for learning multilingual multimodal representations, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp.2839-2845, 2017. ,
Representation of linguistic form and function in recurrent neural networks, Comput. Linguist, vol.43, issue.4, pp.761-780, 2017. ,
Learning word-like units from joint audio-visual analysis, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.506-517, 2017. ,
Visually grounded learning of keyword prediction from untranscribed speech, 2017. ,
Unsupervised learning of spoken language with visual context, Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, pp.1858-1866, 2016. ,
Cognitive science in the era of artificial intelligence: A roadmap for reverse-engineering the infant language-learner, Cognition, vol.173, pp.43-59, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01888694
Principles of perceptual learning and development, The century psychology series, 1969. ,
Very deep convolutional networks for large-scale image recognition, Proceedings of ICLR 2015, pp.1-14, 2015. ,
Montreal forced aligner: Trainable text-speech alignment using kaldi, 2017. ,
Multilingual processing of speech via web services, Computer Speech & Language, vol.45, pp.326-347, 2017. ,
Probabilistic part-of-speech tagging using decision trees, Studies in Computational Linguistics, pp.154-164, 1997. ,
Pointwise prediction for robust, adaptable japanese morphological analysis, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp.529-533, 2011. ,
A universal part-of-speech tagset, Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), 2012. ,
Why nouns are learned before verbs: Linguistic relativity versus natural partitioning, Language, vol.2, pp.301-334, 1982. ,
Use of bound morphemes (noun particles) in word segmentation by japaneselearning infants, Journal of Memory and Language, vol.88, issue.C, pp.18-27, 2016. ,
Studies of child language development, 1973. ,