D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by jointly learning to align and translate, 2014.

O. Caglayan, W. Aransa, A. Bardet, M. García-martínez, F. Bougares et al., Lium-cvc submissions for wmt17 multimodal translation task, Proceedings of the Second Conference on Machine Translation, vol.2, pp.432-439, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01742382

O. Caglayan, L. Barrault, and F. Bougares, Multimodal attention for neural machine translation, 2016.

O. Caglayan, M. García-martínez, and A. Bardet, Nmtpy: A flexible toolkit for advanced neural machine translation systems, Prague Bull. Math. Linguistics, vol.109, pp.15-28, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01647873

K. Cho, C. Bart-van-merrienboer, D. Gulcehre, F. Bahdanau, H. Bougares et al., Learning phrase representations using rnn encoder-decoder for statistical machine translation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.1724-1734, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01433235

J. H. Clark, C. Dyer, A. Lavie, and N. A. Smith, Better hypothesis testing for statistical machine translation: Controlling for optimizer instability, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers, vol.2, pp.176-181, 2011.

D. Elliott, S. Frank, L. Barrault, F. Bougares, and L. Specia, Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description, Proceedings of the Second Conference on Machine Translation, 2017.

D. Elliott, S. Frank, K. Sima'an, and L. Specia, Multi30k: Multilingual englishgerman image descriptions, Proceedings of the 5th Workshop on Vision and Language, pp.70-74, 2016.

D. Elliott and . Kádár, Imagination improves multimodal translation, 2017.

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.770-778, 2016.

K. He, X. Zhang, S. Ren, and J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, The IEEE International Conference on Computer Vision (ICCV), 2015.

J. Helcl and J. Libovický, Cuni system for the wmt17 multimodal translation task, Proceedings of the Second Conference on Machine Translation, vol.2, pp.450-457, 2017.

V. Kazemi and A. Elqursh, Show, ask, attend, and answer: A strong baseline for visual question answering, 2017.

D. Kingma and J. Ba, Adam: A method for stochastic optimization, 2014.

A. Lavie and A. Agarwal, Meteor: An automatic metric for mt evaluation with high levels of correlation with human judgments, Proceedings of the Second Workshop on Statistical Machine Translation, StatMT '07, pp.228-231, 2007.

K. Papineni, S. Roukos, T. Ward, and W. Zhu, Bleu: A method for automatic evaluation of machine translation, Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL '02, pp.311-318, 2002.

R. Pascanu, T. Mikolov, and Y. Bengio, On the difficulty of training recurrent neural networks, Proceedings of the 30th International Conference on International Conference on Machine Learning, vol.28, 2013.

O. Press and L. Wolf, Using the output embedding to improve language models, 2016.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh et al., Ima-geNet Large Scale Visual Recognition Challenge, International Journal of Computer Vision (IJCV), vol.115, issue.3, pp.211-252, 2015.

R. Sennrich, O. Firat, K. Cho, A. Birch-mayne, B. Haddow et al., Nematus: a toolkit for neural machine translation, Proceedings of the EACL 2017 Software Demonstrations, pp.65-68, 2017.

R. Sennrich, B. Haddow, and A. Birch, Neural machine translation of rare words with subword units, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol.1, pp.1715-1725, 2016.

L. Specia, S. Frank, K. Sima'an, and D. Elliott, A shared task on multimodal machine translation and crosslingual image description, Proceedings of the First Conference on Machine Translation, pp.543-553, 2016.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting, J. Mach. Learn. Res, vol.15, issue.1, pp.1929-1958, 2014.

K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville et al., Show, attend and tell: Neural image caption generation with visual attention, Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp.2048-2057, 2015.

Z. Yang, X. He, J. Gao, L. Deng, and A. J. Smola, Stacked attention networks for image question answering, CVPR, pp.21-29, 2016.