Labeled faces in the wild: A survey Advances in Face Detection and Facial Image Analysis, pp.189-248, 2016. ,
Deep recurrent models with fast-forward connections for neural machine translation, Transactions of the Association for Computational Linguistics (TACL), vol.4, pp.371-383, 2016. ,
The IBM 2016 English Conversational Telephone Speech Recognition System, Interspeech 2016, pp.520-527, 2016. ,
DOI : 10.21437/Interspeech.2016-1460
URL : http://arxiv.org/abs/1505.05899
Mastering the game of Go with deep neural networks and tree search, Nature, vol.529, issue.7587, pp.484-489, 2016. ,
ImageNet classification with deep convolutional neural networks, Advances in Neural Information 525 Processing Systems (NIPS), pp.1097-1105, 2012. ,
DOI : 10.1162/neco.2009.10-08-881
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.299.205
Deep Residual Learning for Image Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.770-778, 2016. ,
DOI : 10.1109/CVPR.2016.90
URL : http://arxiv.org/abs/1512.03385
How transferable are features in deep neural networks?, Advances in Neural Information Processing Systems (NIPS), pp.3320-3328, 2014. ,
Return of the Devil in the Details: Delving Deep into Convolutional Nets, Proceedings of the British Machine Vision Conference 2014, 2014. ,
DOI : 10.5244/C.28.6
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.37, issue.9 ,
DOI : 10.1109/TPAMI.2015.2389824
URL : http://arxiv.org/abs/1406.4729
Multi-scale Orderless Pooling of Deep Convolutional Activation Features, European Conference on Computer Vision (ECCV), pp.392-407, 2014. ,
DOI : 10.1007/978-3-319-10584-0_26
URL : http://arxiv.org/abs/1403.1840
Netvlad: CNN 545 architecture for weakly supervised place recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.5297-5307, 2016. ,
DOI : 10.1109/cvpr.2016.572
URL : http://arxiv.org/abs/1511.07247
Large-margin weakly supervised dimensionality reduction, International Conference on Machine Learning, 2014. ,
Solving the multiple instance problem with axis-parallel rectangles, Artificial Intelligence, vol.89, issue.1-2, pp.31-71, 1997. ,
DOI : 10.1016/S0004-3702(96)00034-3
URL : http://doi.org/10.1016/s0004-3702(96)00034-3
Object Detection with Discriminatively Trained Part-Based Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.9, pp.1627-1645, 2010. ,
DOI : 10.1109/TPAMI.2009.167
Training Object Class Detectors from Eye Tracking Data, European Conference on Computer Vision (ECCV), pp.361-376, 2014. ,
DOI : 10.1007/978-3-319-10602-1_24
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.645.7140
Large margin multi-modal multitask feature extraction for image classification, IEEE Trans. Image Processing, vol.17, issue.251, pp.560-414, 2016. ,
Multiview vector-valued manifold regularization for multilabel image classification, IEEE Transac- 565 tions on Neural Networks and Learning Systems, pp.709-722, 2013. ,
Object bank: A high-level image representation for scene classification & semantic feature sparsification, Advances in Neural Information Processing Systems, pp.1378-1386, 2010. ,
DOI : 10.1007/s11263-013-0660-x
The concave-convex procedure (CCCP), pp.570-1033, 2001. ,
DOI : 10.1162/08997660360581958
Gaze latent support vector machine for image classification, 2016 IEEE International Conference on Image Processing (ICIP), pp.236-240, 2016. ,
DOI : 10.1109/ICIP.2016.7532354
URL : https://hal.archives-ouvertes.fr/hal-01342580
Multiple instance learning for soft bags via top in- 575 stances, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4277-4285, 2015. ,
DOI : 10.1109/cvpr.2015.7299056
Can computers learn from humans to see better?: inferring scene semantics from viewers' eye movements, International Conference on Multimedia, pp.33-42, 2011. ,
Studying Relationships between Human Gaze, Description, and Computer Vision, 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp.580-739, 2013. ,
DOI : 10.1109/CVPR.2013.101
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.294.4727
Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.37, issue.7 ,
DOI : 10.1109/TPAMI.2014.2366154
Action classification in still images using human eye movements, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp.2015-2031, 2015. ,
DOI : 10.1109/CVPRW.2015.7301288
From Where and How to What We See, 2013 IEEE International Conference on Computer Vision, pp.590-625, 2013. ,
DOI : 10.1109/ICCV.2013.83
Shallow and Deep Convolutional Networks for Saliency Prediction, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.598-606, 2016. ,
DOI : 10.1109/CVPR.2016.71
URL : http://arxiv.org/abs/1603.00845
Saliency Unified: A Deep Architecture for simultaneous Eye Fixation Prediction and Salient Object Segmentation, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.5781-5790, 2016. ,
DOI : 10.1109/CVPR.2016.623
Can you see it? two novel eye-trackingbased measures for assigning tags to image regions, Advances in Multimedia Modeling, International Conference, pp.36-46, 2013. ,
DOI : 10.1007/978-3-642-35725-1_4
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.294.4905
Eye tracking 605 assisted extraction of attentionally important objects from videos, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.3241-3250, 2015. ,
Action is in the eye of the beholder: Eye-gaze driven model for spatio-temporal action localization, p.610 ,
You-Do, I-Learn: Egocentric unsupervised discovery of objects and their modes of interaction towards video-based guidance, Computer Vision and Image Understanding, vol.149, pp.98-112, 2016. ,
DOI : 10.1016/j.cviu.2016.02.016
Gazeenabled egocentric video summarization via constrained submodular maximization, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2235-2244, 2015. ,
DOI : 10.1109/cvpr.2015.7298836
URL : http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4784707
Crowdsourcing Annotations for Visual 620 Object Detection, pp.1-6, 2012. ,
Robust Higher Order Potentials for Enforcing Label Consistency, International Journal of Computer Vision, vol.24, issue.3, pp.302-324, 2009. ,
DOI : 10.1016/S0166-218X(01)00341-9
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.187.8646
One gaze is worth ten thou- 625 sand (key-)words, IEEE International Conference on Image Processing (ICIP), pp.3150-3154, 2015. ,
DOI : 10.1109/icip.2015.7351384
Action from still image dataset and inverse optimal control to learn task specific visual scanpaths, Advances in Neural Information Processing Systems, pp.1923-1931, 2013. ,
PET: An eye-tracking dataset for animal-centric Pascal object classes, 2015 IEEE International Conference on Multimedia and Expo (ICME), pp.1-6, 2015. ,
DOI : 10.1109/ICME.2015.7177450
URL : http://arxiv.org/abs/1604.01574
Recipe recognition with large multimodal food dataset, p.635 ,
URL : https://hal.archives-ouvertes.fr/hal-01196959
Relaxed Multiple-Instance SVM with Application to Object Discovery, 2015 IEEE International Conference on Computer Vision (ICCV), pp.1224-1232, 2015. ,
DOI : 10.1109/ICCV.2015.145
URL : http://arxiv.org/abs/1510.01027
Multiple instance subspace learning via partial random projection tree for local reflection symmetry in natural images, Pattern Recognition, vol.52, pp.306-316, 2016. ,
DOI : 10.1016/j.patcog.2015.10.015
Blocks That Shout: Distinctive Parts for Scene Classification, 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp.923-930, 2013. ,
DOI : 10.1109/CVPR.2013.124
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.649.1623
Learning Discriminative Part Detectors for Image Classification and Cosegmentation, 2013 IEEE International Conference on Computer Vision, pp.3400-3407, 2013. ,
DOI : 10.1109/ICCV.2013.422
URL : https://hal.archives-ouvertes.fr/hal-00932380
Max-margin multiple-instance dictionary learning, International Conference on Machine Learning, pp.2013-846 ,
Generalized Dictionaries for Multiple Instance Learning, International Journal of Computer Vision, vol.60, issue.4, pp.288-305, 2015. ,
DOI : 10.1109/CVPR.2010.5539989
Support vector machines for 655 multiple-instance learning, Advances in Neural Information Processing Systems (NIPS), pp.561-568, 2002. ,
Incremental learning of latent structural SVM for weakly supervised image classification, 2014 IEEE International Conference on Image Processing (ICIP), pp.4246-4250, 2014. ,
DOI : 10.1109/ICIP.2014.7025862
URL : https://hal.archives-ouvertes.fr/hal-01077058
Object and Action Classification with Latent Window Parameters, International Journal of Computer Vision, vol.15, issue.4, pp.237-251, 2014. ,
DOI : 10.1109/CVPR.2010.5540096
Spotlight the Negatives: A Generalized Discriminative Latent Model, Procedings of the British Machine Vision Conference 2015, pp.1-11, 2015. ,
DOI : 10.5244/C.29.18
URL : http://arxiv.org/abs/1507.02144
MANTRA: Minimum Maximum Latent Structural SVM for Image Classification and Ranking, 2015 IEEE International Conference on Computer Vision (ICCV), pp.2713-2721, 2015. ,
DOI : 10.1109/ICCV.2015.311
URL : https://hal.archives-ouvertes.fr/hal-01343784
WELDON: Weakly Supervised Learning of Deep Convolutional Neural Networks, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4743-4752, 2016. ,
DOI : 10.1109/CVPR.2016.513
URL : https://hal.archives-ouvertes.fr/hal-01343785
Multiple instance reinforcement learning for efficient weakly-supervised detection in images ,
Reinforcement learning for visual 675 object detection, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.2894-2902, 2016. ,
DOI : 10.1109/cvpr.2016.316
GazeDPM: Early integration of gaze information in deformable part models ,
Learning using privileged information: Similarity 680 control and knowledge transfer, J. Mach. Learn. Res, vol.16, pp.2023-2049, 2015. ,
DOI : 10.1007/978-3-319-17091-6_1
Privileged multi-label learning, International Joint Conference on Artificial Intelligence (IJCAI), 2017. M ,
Cutting-plane training of structural SVMs, Machine Learning, vol.6, issue.2, pp.27-59, 2009. ,
DOI : 10.1007/s10994-009-5108-8
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.140.1367
The attraction of visual attention to texts in real-world scenes, Journal of Vision, vol.12, pp.1-17, 2012. ,
Tobii Studio User's Manual Version 3, 2016. ,
The Tobii I-VT Fixation Filter, 2012. ,
What do we perceive in a glance of 690 a real-world scene?, Journal of Vision, vol.7, pp.1-29, 2007. ,
The Pascal visual object classes challenge: A retrospective, International Journal of Computer Vision, vol.111, issue.1, pp.98-136, 2015. ,
Deep Fishing: Gradient Features from Deep Nets, Procedings of the British Machine Vision Conference 2015, pp.1-12, 2015. ,
DOI : 10.5244/C.29.111
URL : http://arxiv.org/abs/1507.06429
Visualizing and Understanding Convolutional Networks, European Conference on Computer Vision, pp.818-833, 2014. ,
DOI : 10.1007/978-3-319-10590-1_53
URL : http://arxiv.org/abs/1311.2901
Contextualizing object detection and classification, CVPR 2011, pp.1585-1592, 2011. ,
DOI : 10.1109/CVPR.2011.5995330
Learning and transferring midlevel image representations using convolutional neural networks, IEEE CVPR, pp.1717-1724, 2014. ,
DOI : 10.1109/cvpr.2014.222
URL : https://hal.archives-ouvertes.fr/hal-00911179
Actions and attributes from wholes and 705 parts, IEEE International Conference on Computer Vision (ICCV), pp.2470-2478, 2015. ,
DOI : 10.1109/iccv.2015.284
URL : http://arxiv.org/abs/1412.2604
Regularized Max Pooling for Image Categorization, Proceedings of the British Machine Vision Conference 2014, pp.1-12, 2014. ,
DOI : 10.5244/C.28.32