J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., ImageNet: A Large-Scale Hierarchical Image Database, CVPR, 2009.

A. Krizhevsky, I. Sutskever, and G. Hinton, Imagenet classification with deep convolutional neural networks, NIPS, 2012.

X. Wang, D. Kumar, N. Thome, M. Cord, and F. Precioso, Recipe recognition with large multimodal food dataset, ICME workshop, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01196959

M. Chevalier, N. Thome, M. Cord, J. Fournier, G. Henaff et al., LR-CNN for fine-grained classification with varying resolution, 2015 IEEE International Conference on Image Processing (ICIP), 2015.
DOI : 10.1109/ICIP.2015.7351374

URL : https://hal.archives-ouvertes.fr/hal-01196958

M. Blaschko, P. Kumar, and B. Taskar, Tutorial: Visual learning with weak supervision, CVPR, 2013.

T. Durand, N. Thome, M. Cord, and S. E. Fontes-de-avila, Image classification using object detectors, 2013 IEEE International Conference on Image Processing, 2013.
DOI : 10.1109/ICIP.2013.6738894

URL : https://hal.archives-ouvertes.fr/hal-01078079

S. Andrews, I. Tsochantaridis, and T. Hofmann, Support Vector Machines for Multiple-Instance Learning, NIPS, 2002.

P. F. Felzenszwalb, R. B. Girshick, D. Mcallester, and D. Ramanan, Object Detection with Discriminatively Trained Part-Based Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.9, 2010.
DOI : 10.1109/TPAMI.2009.167

M. Pandey and S. Lazebnik, Scene recognition and weakly supervised object localization with deformable partbased models, ICCV, 2011.
DOI : 10.1109/iccv.2011.6126383

O. Russakovsky, Y. Lin, K. Yu, and L. Fei-fei, Object-Centric Spatial Pooling for Image Classification, ECCV, 2012.
DOI : 10.1007/978-3-642-33709-3_1

H. Bilen, V. P. Namboodiri, and L. J. Van-gool, Object and Action Classification with Latent Window Parameters, International Journal of Computer Vision, vol.15, issue.4, 2014.
DOI : 10.1007/s11263-013-0646-8

M. Juneja, C. V. Jawahar, A. Zisserman, and A. Vedaldi, Blocks That Shout: Distinctive Parts for Scene Classification, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.
DOI : 10.1109/CVPR.2013.124

J. Sun and J. Ponce, Learning Discriminative Part Detectors for Image Classification and Cosegmentation, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.422

URL : https://hal.archives-ouvertes.fr/hal-00932380

P. Kumar, B. Packer, and D. Koller, Self-paced learning for latent variable models, NIPS, 2010.

T. Durand, N. Thome, M. Cord, and D. Picard, Incremental learning of latent structural SVM for weakly supervised image classification, 2014 IEEE International Conference on Image Processing (ICIP), 2014.
DOI : 10.1109/ICIP.2014.7025862

URL : https://hal.archives-ouvertes.fr/hal-01077058

W. Li and N. Vasconcelos, Multiple instance learning for soft bags via top instances, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
DOI : 10.1109/CVPR.2015.7299056

T. Durand, N. Thome, and M. Cord, MANTRA: Minimum Maximum Latent Structural SVM for Image Classification and Ranking, 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
DOI : 10.1109/ICCV.2015.311

URL : https://hal.archives-ouvertes.fr/hal-01343784

T. Durand, N. Thome, M. C. , A. D. Clarke, F. Keller et al., WEL- DON: Weakly Supervised Learning of Deep Convolutional Neural Networks Training object class detectors from eye tracking data, CVPR, 2016. [19] Dim P. Papadopoulos, ECCV, 2014.

H. Su, J. Deng, and L. Fei-fei, Crowdsourcing Annotations for Visual Object Detection, AAAI Workshop, 2012.

P. Kohli, L. Ladick´yladick´y, and P. H. Torr, Robust Higher Order Potentials for Enforcing Label Consistency, pp.302-324, 2009.

S. Lopez, A. Revel, D. Lingrand, and F. Precioso, One gaze is worth ten thousand (key-)words, 2015 IEEE International Conference on Image Processing (ICIP), 2015.
DOI : 10.1109/ICIP.2015.7351384

URL : https://hal.archives-ouvertes.fr/hal-01323204

S. Mathe and C. Sminchisescu, Action from still image dataset and inverse optimal control to learn task specific visual scanpaths, NIPS, 2013.

S. Karthikeyan, V. Jagadeesh, R. Shenoy, M. Eckstein, and B. S. Manjunath, From Where and How to What We See, 2013 IEEE International Conference on Computer Vision, 2013.
DOI : 10.1109/ICCV.2013.83

A. Fathi, Y. Li, and J. M. Rehg, Learning to Recognize Daily Actions Using Gaze, ECCV, 2012.
DOI : 10.1007/978-3-642-33718-5_23

D. Samaras, G. J. Zelinsky-gary-ge, and K. Yun, Action Classification in Still Images Using Human Eye Movements, CVPRW, 2015.

I. Shcherbatyi, A. Bulling, and M. Fritz, GazeDPM: Early Integration of Gaze Information in Deformable Part Models, 2015.

L. Sigal, G. Mori-nataliya-shapovalova, and M. Raptis, Action is in the eye of the beholder: Eye-gaze driven model for spatio-temporal action localization, NIPS, 2013.

T. Joachims, T. Finley, and C. Yu, Cutting-plane training of structural SVMs, Machine Learning, vol.6, issue.2, 2009.
DOI : 10.1007/s10994-009-5108-8

A. L. Yuille and A. Rangarajan, The Concave-Convex Procedure (CCCP), NIPS, 2002.

L. Li, H. Su, E. P. Xing, and L. Fei-fei, Object bank: A high-level image representation for scene classification & semantic feature sparsification, NIPS, 2010.

M. Everingham, S. M. Ali-eslami, L. Van-gool, C. K. Williams, J. Winn et al., The Pascal Visual Object Classes Challenge: A Retrospective, International Journal of Computer Vision, vol.34, issue.11, pp.98-136, 2015.
DOI : 10.1007/s11263-014-0733-5