B. , E. Bruni, G. Boleda, M. Baroni, and N. Tran, Distributional semantics in technicolor, ACL 2012BCB15] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. ICLR, 2012.

[. Bruni, N. K. Tran, and M. Baroni, Multimodal distributional semantics. JAIR, 2014.

[. Bruni, J. R. Uijlings, M. Baroni, and N. Sebe, Distributional semantics with eyes, Proceedings of the 20th ACM international conference on Multimedia, MM '12, 2012.
DOI : 10.1145/2393347.2396422

[. Collell and M. Moens, Is an Image Worth More than a Thousand Words ? On the Fine-Grain Semantic Differences between Visual and Linguistic Representations, Coling, 2016.

[. Collell and M. Moens, Learning representations specialized in spatial knowledge : Leveraging language and vision, TACL, vol.6, pp.133-144, 2018.

[. Collell, T. Zhang, and M. Moens, Imagined visual representations as multimodal embeddings, AAAI, 2017.

L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan et al., Placing search in context, Proceedings of the tenth international conference on World Wide Web , WWW '01, 2002.
DOI : 10.1145/371920.372094

M. Arthur, . Glenberg, P. Michael, and . Kaschak, Grounding language in action. Psychonomic bulletin & review, 2002.

J. Gordon and B. Van-durme, Reporting bias and knowledge acquisition, Proceedings of the 2013 workshop on Automated knowledge base construction, AKBC '13, 2013.
DOI : 10.1145/2509558.2509563

A. Korhonen, Learning abstract concept embeddings from multi-modal data : Since you probably can't see what I mean, EMNLP, 1954.

[. Hill, R. Reichart, and A. Korhonen, Multi- Modal Models for Concrete and Abstract Concept Meaning, 2014.

[. Hill, R. Reichart, and A. Korhonen, SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation, Computational Linguistics, vol.41, issue.4, 2015.
DOI : 10.3115/981732.981751

[. Kiela and L. Bottou, Learning Image Embeddings using Convolutional Neural Networks for Improved Multi-Modal Semantics, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.
DOI : 10.3115/v1/D14-1005

[. Kiela, F. Hill, A. Korhonen, and S. Clark, Improving Multi-Modal Representations Using Image Dispersion: Why Less is Sometimes More, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2014.
DOI : 10.3115/v1/P14-2135

[. Kottur, R. Vedantam, M. F. José, D. Moura, and . Parikh, VisualWord2Vec (Vis-W2V): Learning Visually Grounded Word Embeddings Using Abstract Scenes, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
DOI : 10.1109/CVPR.2016.539

Y. Krishna, O. Zhu, J. Groth, K. Johnson, J. Hata et al., Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations, International Journal of Computer Vision, vol.2, issue.1???2, 2017.
DOI : 10.1109/CVPR.2013.387

O. Levy and Y. Goldberg, Dependency-Based Word Embeddings, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2014.
DOI : 10.3115/v1/P14-2050

[. Ludwig, X. Liu, P. Kordjamshidi, and M. Moens, Deep embedding for spatial role labeling

A. Lazaridou, . Nghia-the, M. B. Pham, K. Mcrae, S. George et al., Combining language and vision with a multimodal skip-gram model Semantic feature production norms for a large set of living and nonliving things. Behavior research methods Learning word vectors for sentiment analysis, NAACL, 2015. [MCSM05] ACL, 2005.

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, Distributed Representations of Words and Phrases and their Compositionality, NIPS, 2013. [MZMG16] Ishan Misra, C. Lawrence Zitnick, Margaret Mitchell , and Ross Girshick. Seeing through the Human Reporting Bias : Visual Classifiers from Noisy Human-Centric Labels. CVPR, 2016.

T. Eric, B. Nalisnick, N. Mitra, R. Craswell, L. Douglas et al., The usf free association, rhyme, and word fragment norms Glove : Global vectors for word representation A neural attention model for abstractive sentence summarization Software Framework for Topic Modelling with Large Corpora Conceptnet 5.5 : An open multilingual graph of general knowledge Grounded models of semantic representation Learning grounded meaning representations with autoencoders Rethinking the inception architecture for computer vision Learning better word embedding by asymmetric low-rank projection of knowledge graph Learning multi-modal word representation grounded in visual context, WWW EMNLP EMNLP, 2015. [?S] Radim ?eh??ek and Petr Sojka Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks AAAI, 2017. [SL12] Carina Silberer and Mirella Lapata ACL, 2014. [SVI + 16] Christian Szegedy CVPR, 2016. [TGCL16] Fei Tian Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2004.