M. A. Fischler and R. A. Elschlager, The Representation and Matching of Pictorial Structures, IEEE Transactions on Computers, vol.22, issue.1, pp.67-92, 1973.
DOI : 10.1109/T-C.1973.223602

M. Weber, M. Welling, and P. Perona, Towards automatic discovery of object categories, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662), 2000.
DOI : 10.1109/CVPR.2000.854754

S. Ullman, E. Sali, and M. Vidal-naquet, A Fragment-Based Approach to Object Representation and Classification, In: Visual Form, 2001.
DOI : 10.1007/3-540-45129-3_7

P. F. Felzenszwalb, R. B. Girshick, D. Mcallester, and D. Ramanan, Object detection with discriminatively trained part-based models. Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.32, issue.9, pp.1627-1645, 2010.

C. Doersch, A. Gupta, and A. A. Efros, Mid-level visual element discovery as discriminative mode seeking, Proceedings on Neural Information Processing Systems, 2013.

S. Singh, A. Gupta, and A. Efros, Unsupervised Discovery of Mid-Level Discriminative Patches, European Conference on Computer Vision, pp.73-86, 2012.
DOI : 10.1007/978-3-642-33709-3_6

M. Juneja, A. Vedaldi, C. Jawahar, and A. Zisserman, Blocks That Shout: Distinctive Parts for Scene Classification, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.124

C. Doersch, S. Singh, A. Gupta, J. Sivic, and A. Efros, What makes paris look like paris?, ACM Transactions on Graphics, vol.31, issue.4, 2012.
URL : https://hal.archives-ouvertes.fr/hal-01053876

S. N. Parizi, A. Vedaldi, A. Zisserman, and P. Felzenszwalb, Automatic Discovery and Optimization of Parts for Image Classification, International Conference on Learning Representations, 2015.

H. Lobel, R. Vidal, and A. Soto, Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.213

M. Hoai and A. Zisserman, Improving Human Action Recognition Using Score Distribution and Ranking, Asian Conference on Computer Vision, 2014.
DOI : 10.1007/978-3-319-16814-2_1

L. Mason, J. Baxter, P. Bartlett, and M. Frean, Boosting algorithms as gradient descent in function space, NIPS, 1999.

J. Friedman, T. Hastie, and R. Tibshirani, Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors), The Annals of Statistics, vol.28, issue.2, pp.337-407, 2000.
DOI : 10.1214/aos/1016218223

M. Oquab, L. Bottou, I. Laptev, and J. Sivic, Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.
DOI : 10.1109/CVPR.2014.222

URL : https://hal.archives-ouvertes.fr/hal-00911179

K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman, Return of the Devil in the Details: Delving Deep into Convolutional Nets, Proceedings of the British Machine Vision Conference 2014, 2014.
DOI : 10.5244/C.28.6

A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, Cnn features off-theshelf: an astounding baseline for recognition, In: Computer Vision and Pattern Recognition Workshops, 2014.

M. Everingham, S. M. Eslami, L. Van-gool, C. K. Williams, J. Winn et al., The Pascal Visual Object Classes Challenge: A Retrospective, International Journal of Computer Vision, vol.34, issue.11, pp.98-136, 2015.
DOI : 10.1007/s11263-014-0733-5

Y. Gong, L. Wang, R. Guo, and S. Lazebnik, Multi-scale Orderless Pooling of Deep Convolutional Activation Features, European Conference on Computer Vision, 2014.
DOI : 10.1007/978-3-319-10584-0_26

P. Kulkarni, J. Zepeda, F. Jurie, P. Perez, and L. Chevallier, Hybrid multi-layer deep CNN/aggregator feature for image classification, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.
DOI : 10.1109/ICASSP.2015.7178196

M. Cimpoi, S. Maji, and A. Vedaldi, Deep filter banks for texture recognition and segmentation, IEEE International Conference on Computer Vision and Pattern Recognition, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01263622

H. Jégou, M. Douze, C. Schmid, and P. Pérez, Aggregating local descriptors into a compact image representation, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5540039

Y. Li, L. Liu, C. Shen, and A. Van-den-hengel, Mid-level deep pattern mining, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7298699

R. Sicre and F. Jurie, Discovering and Aligning Discriminative Mid-level Features for Image Classification, 2014 22nd International Conference on Pattern Recognition, 2014.
DOI : 10.1109/ICPR.2014.345

URL : https://hal.archives-ouvertes.fr/hal-00996303

C. Gulcehre, K. Cho, R. Pascanu, and Y. Bengio, Learned-Norm Pooling for Deep Feedforward and Recurrent Neural Networks, Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp.530-546, 2014.
DOI : 10.1007/978-3-662-44848-9_34

C. Y. Lee, P. W. Gallagher, and Z. Tu, Generalizing pooling functions in convolutional neural networks: Mixed, gated, and tree, International Conference on Artificial Intelligence and Statistics, 2016.

A. Quattoni and A. Torralba, Recognizing indoor scenes, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206537

V. Delaitre, I. Laptev, and J. Sivic, Recognizing human actions in still images: a study of bag-of-features and part-based representations, Procedings of the British Machine Vision Conference 2010, 2010.
DOI : 10.5244/C.24.97

URL : https://hal.archives-ouvertes.fr/hal-01060885

K. E. Van-de-sande, J. R. Uijlings, T. Gevers, and A. W. Smeulders, Segmentation as selective search for object recognition, 2011 International Conference on Computer Vision, 2011.
DOI : 10.1109/ICCV.2011.6126456

N. Chavali, H. Agrawal, A. Mahendru, and D. Batra, Object-Proposal Evaluation Protocol is ???Gameable???, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2016.97

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long et al., Caffe, Proceedings of the ACM International Conference on Multimedia, MM '14, 2014.
DOI : 10.1145/2647868.2654889

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409, p.1556, 2014.

B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva, Learning deep features for scene recognition using places database, Proceedings on Neural Information Processing Systems, 2014.

P. Kulkarni, J. Zepeda, F. Jurie, P. Perez, and L. Chevallier, Max-Margin, Single- Layer Adaptation of Transferred Image Features, In: BigVision Workshop, Computer Vision and Pattern Recognition, 2015.

K. He, X. Zhang, S. Ren, and J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.37, issue.9, pp.1904-1916, 2015.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed et al., Going deeper with convolutions, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7298594

F. S. Khan, R. M. Anwer, J. Van-de-weijer, A. D. Bagdanov, A. M. Lopez et al., Coloring Action Recognition in Still Images, International Journal of Computer Vision, vol.73, issue.2, pp.205-221, 2013.
DOI : 10.1007/s11263-013-0633-0

G. Sharma, F. Jurie, and C. Schmid, Discriminative spatial saliency for image classification, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012.
DOI : 10.1109/CVPR.2012.6248093

URL : https://hal.archives-ouvertes.fr/hal-00714311

G. Sharma, F. Jurie, and C. Schmid, Expanded Parts Model for Human Attribute and Action Recognition in Still Images, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.90

URL : https://hal.archives-ouvertes.fr/hal-00816144