S. Avila, N. Thome, M. Cord, E. Valle, and A. Araujo, Pooling in image representation: The visual codeword point of view, Computer Vision and Image Understanding, vol.117, issue.5, 2012.
DOI : 10.1016/j.cviu.2012.09.007

URL : https://hal.archives-ouvertes.fr/hal-01172709

A. Behl, C. V. Jawahar, and M. P. Kumar, Optimizing average precision using weakly supervised data, CVPR, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00984699

H. Bilen, V. Namboodiri, and L. Van-gool, Object and Action Classification with Latent Window Parameters, IJCV, 2013. 1
DOI : 10.1007/s11263-013-0646-8

M. Blaschko, P. Kumar, and B. Taskar, Tutorial: Visual learning with weak supervision

K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman, Return of the Devil in the Details: Delving Deep into Convolutional Nets, Proceedings of the British Machine Vision Conference 2014, p.8
DOI : 10.5244/C.28.6

T. Do and T. Artì-eres, Regularized bundle methods for convex and non-convex risks, JMLR, issue.4, 2012.

C. Doersch, A. Gupta, and A. A. Efros, Mid-level visual element discovery as discriminative mode seeking, NIPS, 2013.

T. Durand, N. Thome, M. Cord, and D. Picard, Incremental learning of latent structural SVM for weakly supervised image classification, 2014 IEEE International Conference on Image Processing (ICIP), 2014.
DOI : 10.1109/ICIP.2014.7025862

URL : https://hal.archives-ouvertes.fr/hal-01077058

P. F. Felzenszwalb, R. B. Girshick, D. Mcallester, and D. Ramanan, Object Detection with Discriminatively Trained Part-Based Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.9, 2007.
DOI : 10.1109/TPAMI.2009.167

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.81

H. Goh, N. Thome, M. Cord, and J. Lim, Learning Deep Hierarchical Visual Feature Coding, IEEE Transactions on Neural Networks and Learning Systems, 2014.
DOI : 10.1109/TNNLS.2014.2307532

URL : https://hal.archives-ouvertes.fr/hal-01185465

Y. Gong, L. Wang, R. Guo, and S. Lazebnik, Multi-scale Orderless Pooling of Deep Convolutional Activation Features, ECCV, 2014
DOI : 10.1007/978-3-319-10584-0_26

K. He, X. Zhang, S. Ren, and J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, ECCV, 2014, p.8

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long et al., Caffe, Proceedings of the ACM International Conference on Multimedia, MM '14, 2014.
DOI : 10.1145/2647868.2654889

T. Joachims, T. Finley, and C. Yu, Cutting-plane training of structural SVMs, Machine Learning, p.4, 2009.
DOI : 10.1007/s10994-009-5108-8

M. Juneja, A. Vedaldi, C. V. Jawahar, and A. Zisserman, Blocks That Shout: Distinctive Parts for Scene Classification, 2013 IEEE Conference on Computer Vision and Pattern Recognition
DOI : 10.1109/CVPR.2013.124

A. Krizhevsky, I. Sutskever, and G. Hinton, Imagenet classification with deep convolutional neural networks, NIPS. 2012. 1

P. Kumar, B. Packer, and D. Koller, Self-paced learning for latent variable models, NIPS, 2010.

M. T. Law, N. Thome, and M. Cord, Fantope Regularization in Metric Learning, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.138

URL : https://hal.archives-ouvertes.fr/hal-01094074

S. Lazebnik, C. Schmid, and J. Ponce, Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), 2006.
DOI : 10.1109/CVPR.2006.68

URL : https://hal.archives-ouvertes.fr/inria-00548585

L. Li and F. Li, What, where and who? Classifying events by scene and object recognition, 2007 IEEE 11th International Conference on Computer Vision, 2007.
DOI : 10.1109/ICCV.2007.4408872

E. P. Li, -. Li, H. Su, and L. Fei-fei, Object bank: A high-level image representation for scene classification & semantic feature sparsification, NIPS, 2010.

P. Mohapatra, C. Jawahar, and M. P. Kumar, Efficient optimization for average precision svm, NIPS. 2014
URL : https://hal.archives-ouvertes.fr/hal-01069917

M. Oquab, L. Bottou, I. Laptev, and J. Sivic, Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2008.
DOI : 10.1109/CVPR.2014.222

URL : https://hal.archives-ouvertes.fr/hal-00911179

M. Pandey and S. Lazebnik, Scene recognition and weakly supervised object localization with deformable part-based models, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126383

S. N. Parizi, J. G. Oberlin, and P. F. Felzenszwalb, Reconfigurable models for scene recognition, 2012 IEEE Conference on Computer Vision and Pattern Recognition
DOI : 10.1109/CVPR.2012.6248001

F. Perronnin and C. R. Dance, Fisher Kernels on Visual Vocabularies for Image Categorization, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383266

A. Quattoni and A. Torralba, Recognizing indoor scenes, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206537

O. Russakovsky, Y. Lin, K. Yu, and L. Fei-fei, Objectcentric spatial pooling for image classification, ECCV, 2012. 1

F. Sadeghi and M. F. Tappen, Latent Pyramidal Regions for Recognizing Scenes, ECCV, 2012
DOI : 10.1007/978-3-642-33715-4_17

T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Poggio, Robust object recognition with cortex-like mechanisms. PAMI, 2007.

G. Sharma, F. Jurie, and C. Schmid, Discriminative spatial saliency for image classification, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6248093

URL : https://hal.archives-ouvertes.fr/hal-00714311

J. Sun and J. Ponce, Learning Discriminative Part Detectors for Image Classification and Cosegmentation, 2013 IEEE International Conference on Computer Vision, 2006.
DOI : 10.1109/ICCV.2013.422

URL : https://hal.archives-ouvertes.fr/hal-00932380

C. Thériault, N. Thome, and M. Cord, Dynamic Scene Classification: Learning Motion Descriptors with Slow Features Analysis, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.336

C. Thériault, N. Thome, and M. Cord, Extended Coding and Pooling in the HMAX Model, IEEE Transactions on Image Processing, vol.22, issue.2, p.2013
DOI : 10.1109/TIP.2012.2222900

I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, Large margin methods for structured and interdependent output variables, JMLR, issue.3, 2005.

B. Yao and L. Fei-fei, Grouplet: A structured image representation for recognizing human and object interactions, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540234

C. Yu and T. Joachims, Learning structural SVMs with latent variables, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2007.
DOI : 10.1145/1553374.1553523

Y. Yue, T. Finley, F. Radlinski, and T. Joachims, A support vector method for optimizing average precision, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '07, 2005.
DOI : 10.1145/1277741.1277790

A. L. Yuille and A. Rangarajan, The Concave-Convex Procedure, Neural Computation, vol.39, issue.4, p.7, 2003.
DOI : 10.1162/08997660260028674

N. Zhang, M. Paluri, M. Ranzato, T. Darrell, and L. Bourdev, PANDA: Pose Aligned Networks for Deep Attribute Modeling, 2014 IEEE Conference on Computer Vision and Pattern Recognition
DOI : 10.1109/CVPR.2014.212

B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva, Learning Deep Features for Scene Recognition using Places Database, NIPS, vol.5, issue.2 6, 2014.