Z. Akata, S. E. Reed, D. Walter, H. Lee, and B. Schiele, Evaluation of output embeddings for fine-grained image classification, IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, pp.2927-2936, 2015.

Z. Akata, F. Perronnin, Z. Harchaoui, and C. Schmid, Label-embedding for image classification, IEEE Trans. Pattern Anal. Mach. Intell, vol.38, issue.7, pp.1425-1438, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01207145

L. J. Ba, K. Swersky, S. Fidler, and R. Salakhutdinov, Predicting deep zero-shot convolutional neural networks using textual descriptions, 2015 IEEE International Conference on Computer Vision, ICCV 2015, pp.4247-4255, 2015.

A. Bansal, K. Sikka, G. Sharma, R. Chellappa, and A. Divakaran, Zero-shot object detection, Computer Vision -ECCV 2018 -15th European Conference, pp.397-414, 2018.

S. Bell, C. L. Zitnick, K. Bala, and R. B. Girshick, Insideoutside net: Detecting objects in context with skip pooling and recurrent neural networks, 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp.2874-2883, 2016.

S. Bengio, J. Dean, D. Erhan, E. Ie, Q. V. Le et al., Using web cooccurrence statistics for improving image categorization. CoRR, abs/1312, vol.5697, 2013.

M. Bucher, S. Herbin, J. , and F. , Improving semantic embedding consistency by metric learning for zeroshot classiffication, Computer Vision -ECCV 2016 -14th European Conference, pp.730-746, 2016.

Q. Chen, Z. Song, J. Dong, Z. Huang, Y. Hua et al., Contextualizing object detection and classification, IEEE Trans. Pattern Anal. Mach. Intell, vol.37, issue.1, pp.13-27, 2015.

X. Chen, L. Li, L. Fei-fei, and A. Gupta, Iterative visual reasoning beyond convolutions, 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp.7239-7248, 2018.

W. Chu and D. Cai, Deep feature based contextual model for object detection, Neurocomputing, vol.275, pp.1035-1042, 2018.

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., Imagenet: A large-scale hierarchical image database, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.248-255, 2009.

J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., ImageNet: A Large-Scale Hierarchical Image Database, vol.09, 2009.

A. Farhadi, I. Endres, D. Hoiem, and D. A. Forsyth, Describing objects by their attributes, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.1778-1785, 2009.

A. Farhadi, I. Endres, D. Hoiem, and D. A. Forsyth, Describing objects by their attributes, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.1778-1785, 2009.

M. Ferrante, N. Ferro, and S. Pontarollo, Are IR evaluation measures on an interval scale?, Proceedings of the ACM SIGIR International Conference on

, Context-Aware Zero-Shot Learning for Object Recognition Theory of Information Retrieval, vol.2017, pp.67-74, 2017.

A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean et al., Devise: A deep visual-semantic embedding model, Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held, pp.2121-2129, 2013.

Y. Fu, T. M. Hospedales, T. Xiang, and S. Gong, Learning multimodal latent attributes, IEEE Trans. Pattern Anal. Mach. Intell, vol.36, issue.2, pp.303-316, 2014.

Y. Fu, Y. Yang, T. M. Hospedales, T. Xiang, and S. Gong, Transductive multi-label zero-shot learning, 2015.

Z. Fu, T. A. Xiang, E. Kodirov, and S. Gong, Zero-shot object recognition by semantic manifold distance, IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, pp.2635-2644, 2015.

N. Fuhr, Some common mistakes in IR evaluation, and how they can be avoided, SIGIR Forum, vol.51, issue.3, pp.32-41, 2017.

C. Galleguillos, A. Rabinovich, and S. J. Belongie, Object categorization using co-occurrence, location and appearance, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.24-26, 2008.

. Harris, Z. S. Distributional structure. Word, vol.10, issue.2-3, pp.146-162, 1954.

X. He, R. S. Zemel, and M. ´. Carreira-perpiñán, Multiscale conditional random fields for image labeling, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2004), with CD-ROM, pp.695-702, 2004.

D. P. Kingma, J. Ba, and . Adam, A method for stochastic optimization. CoRR, abs/1412, vol.6980, 2014.

E. Kodirov, T. Xiang, and S. Gong, Semantic autoencoder for zero-shot learning, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp.4447-4456, 2017.

R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata et al., Visual genome: Connecting language and vision using crowdsourced dense image annotations, International Journal of Computer Vision, vol.123, issue.1, pp.32-73, 2017.

C. H. Lampert, H. Nickisch, and S. Harmeling, Learning to detect unseen object classes by between-class attribute transfer, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009, pp.951-958, 2009.

C. H. Lampert, H. Nickisch, and S. Harmeling, Attributebased classification for zero-shot visual object categorization, IEEE Trans. Pattern Anal. Mach. Intell, vol.36, issue.3, pp.453-465, 2014.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradientbased learning applied to document recognition, Proceedings of the IEEE, pp.2278-2324, 1998.

Y. Lecun, S. Chopra, and R. Hadsell, A tutorial on energybased learning, Predicting Structured Data, vol.1, issue.0, 2006.

J. Liu, B. Kuipers, and S. Savarese, Recognizing human actions by attributes, The 24th IEEE Conference on Computer Vision and Pattern Recognition, pp.3337-3344, 2011.

Y. Long, L. Liu, L. Shao, F. Shen, G. Ding et al., From zero-shot learning to conventional supervised classification: Unseen visual data synthesis, 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp.6165-6174, 2017.

C. Lu, R. Krishna, M. S. Bernstein, L. , and F. , Visual relationship detection with language priors, Computer Vision -ECCV 2016 -14th European Conference, pp.852-869, 2016.

T. Mensink, J. J. Verbeek, F. Perronnin, and G. Csurka, Metric learning for large scale image classification: Generalizing to new classes at near-zero cost, Computer Vision -ECCV 2012 -12th European Conference on Computer Vision, pp.488-501, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00722313

T. Mensink, E. Gavves, C. Snoek, and . Costa, Cooccurrence statistics for zero-shot classification, IEEE Conference on Computer Vision and Pattern Recognition, pp.2441-2448, 2014.

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, Distributed representations of words and phrases and their compositionality, Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems, pp.3111-3119, 2013.

M. Norouzi, T. Mikolov, S. Bengio, Y. Singer, J. Shlens et al., Zero-shot learning by convex combination of semantic embeddings. CoRR, abs/1312, vol.5650, 2013.

M. Palatucci, D. Pomerleau, G. E. Hinton, and T. M. Mitchell, Zero-shot learning with semantic output codes, Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems, pp.1410-1418, 2009.

D. Parikh and K. Grauman, Interactively building a discriminative vocabulary of nameable attributes, The 24th IEEE Conference on Computer Vision and Pattern Recognition, pp.1681-1688, 2011.

F. J. Pelletier, Did frege believe frege's principle?, Journal of Logic, Language and information, vol.10, issue.1, pp.87-114, 2001.

J. Pennington, R. Socher, and C. D. Manning, Glove: Global vectors for word representation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp.1532-1543, 2014.

M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark et al., Deep contextualized word representations, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, vol.1, pp.2227-2237, 2018.

R. Qiao, L. Liu, C. Shen, . Van-den, and A. Hengel, Less is more: Zero-shot learning from online textual documents with noise suppression, 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp.2249-2257, 2016.

A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, and S. J. Belongie, Objects in context, IEEE 11th International Conference on Computer Vision, pp.1-8, 2007.

E. Real, A. Aggarwal, Y. Huang, and Q. V. Le, Regularized evolution for image classifier architecture search, 2018.

S. Ren, K. He, R. B. Girshick, and J. Sun, Faster R-CNN: towards real-time object detection with region proposal networks, Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems, pp.91-99, 2015.

B. Romera-paredes and P. H. Torr, An embarrassingly simple approach to zero-shot learning, Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, pp.2152-2161, 2015.

A. M. Schakel and B. J. Wilson, Measuring word significance using distributed representations of words. CoRR, 2015.

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, Rethinking the inception architecture for computer vision, 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp.2818-2826, 2016.

A. Torralba, Contextual priming for object detection, International Journal of Computer Vision, vol.53, issue.2, pp.169-191, 2003.

A. Torralba, K. P. Murphy, and W. T. Freeman, Using the forest to see the trees: exploiting context for visual object detection and localization, Commun. ACM, vol.53, issue.3, pp.107-114, 2010.

C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, The caltech-ucsd birds, 2011.

J. Weston, S. Bengio, and N. Usunier, WSABIE: scaling up to large vocabulary image annotation, IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp.2764-2770, 2011.

L. Wolf and S. M. Bileschi, A critical view of context, International Journal of Computer Vision, vol.69, issue.2, pp.251-261, 2006.

Y. Xian, Z. Akata, G. Sharma, Q. N. Nguyen, M. Hein et al., Latent embeddings for zero-shot classification, 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp.69-77, 2016.

J. Yang, J. Lu, S. Lee, D. Batra, and D. Parikh, Graph R-CNN for scene graph generation, Computer Vision -ECCV 2018 -15th European Conference, pp.690-706, 2018.

R. Yu, X. Chen, V. I. Morariu, and L. S. Davis, The role of context selection in object detection, Proceedings of the British Machine Vision Conference, 2016.

´. E. Zablocki, B. Piwowarski, L. Soulier, and P. Gallinari, Learning Multi-Modal Word Representation Grounded in Visual Context, Association for the Advancement of Artificial Intelligence (AAAI), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01632414

R. Zellers, M. Yatskar, S. Thomson, and Y. Choi, Neural motifs: Scene graph parsing with global context, 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp.5831-5840, 2018.

H. Zhang, K. J. Dana, J. Shi, Z. Zhang, X. Wang et al., Context encoding for semantic segmentation, 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp.7151-7160, 2018.

B. Zhao, B. Chang, Z. Jie, and L. Sigal, Modular generative adversarial networks. CoRR, abs/1804.03343, 2018.

B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, Learning transferable architectures for scalable image recognition, 2017.