M. Campr and K. Je?ek, Comparing Semantic Models for Evaluating Automatic Document Summarization, Text, Speech, and Dialogue, 2015.
DOI : 10.1007/978-3-319-24033-6_29

M. Cha, Y. Gwon, and H. T. Kung, Multimodal sparse representation learning and applications. CoRR, abs, 1511.

M. Eskevich, R. Aly, D. N. Racca, R. Ordelman, S. Chen et al., The search and hyperlinking task at MediaEval 2014, Working Notes MediaEval Workshop, 2014.

F. Feng, X. Wang, and R. Li, Cross-modal retrieval with correspondence autoencoder, ACM Intl. Conf. on Multimedia, pp.7-16, 2014.

G. E. Hinton and R. R. Salakhutdinov, Reducing the Dimensionality of Data with Neural Networks, Science, vol.313, issue.5786, pp.313504-507, 2006.
DOI : 10.1126/science.1127647

H. Lu, Y. Liou, H. Lee, and L. Lee, Semantic retrieval of personal photos using a deep autoencoder fusing visual features with speech annotations represented as word/paragraph vectors, Annual Conf. of the Intl. Speech Communication Association, 2015.

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, Distributed representations of words and phrases and their compositionality, Advances in Neural Information Processing Systems, 2013.

N. Srivastava and R. Salakhutdinov, Learning representations for multimodal data with deep belief nets, Intl. Conf. on Machine Learning, 2012.

J. Weston, S. Bengio, and N. Usunier, Large scale image annotation: learning??to??rank with??joint word-image embeddings, Machine Learning, vol.5, issue.1, pp.21-35, 2010.
DOI : 10.1007/s10994-010-5198-3