R. Albatal, Y. Mulhem, and . Chiaramella, Visual Phrases for automatic images annotation, 2010 International Workshop on Content Based Multimedia Indexing (CBMI), pp.1-6, 2010.
DOI : 10.1109/CBMI.2010.5529909

D. Arthur and S. Vassilvitskii, k-means++: the advantages of careful seeding, Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pp.1027-1035, 2007.

M. Z. Aziz and B. Mertsching, Fast and Robust Generation of Feature Maps for Region-Based Visual Attention, IEEE Transactions on Image Processing, vol.17, issue.5, pp.633-644, 2008.
DOI : 10.1109/TIP.2008.919365

. Unis, CNRS, RENATER, several Universities, other funding bodies (see https, 2012.

H. Bay, A. Ess, T. Tuytelaars, and L. Van-gool, Speeded-Up Robust Features (SURF), Computer Vision and Image Understanding, vol.110, issue.3, pp.346-359, 2008.
DOI : 10.1016/j.cviu.2007.09.014

A. F. Bobick and J. W. Davis, The recognition of human movement using temporal templates. Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.23, issue.3, pp.257-267, 2001.

O. Boiman, E. Shechtman, and M. Irani, In defense of nearest-neighbour based image classification, IEEE Conference on Computer Vision and Pattern Recognition, 2008.

A. Borji and L. Itti, State-of-the-Art in Visual Attention Modeling, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.1, p.2012
DOI : 10.1109/TPAMI.2012.89

J. Bouguet, Pyramidal implementation of the lucas kanade feature tracker. Intel Corporation, 2000.

H. Boujut, J. Benois-pineau, T. Ahmed, O. Hadar, and P. Bonnet, A metric for no-reference video quality assessment for HD TV delivery based on saliency maps, 2011 IEEE International Conference on Multimedia and Expo, 2011.
DOI : 10.1109/ICME.2011.6012136

URL : https://hal.archives-ouvertes.fr/hal-00589182

H. Boujut, J. Benois-pineau, and R. Megret, Fusion of Multiple Visual Cues for Visual Saliency Extraction from Wearable Camera Settings with Strong Motion, Computer Vision ECCV 2012. Workshops and Demonstrations, pp.436-445, 2012.
DOI : 10.1007/978-3-642-33885-4_44

URL : https://hal.archives-ouvertes.fr/hal-00742089

Y. L. Boureau, F. Bach, Y. Lecun, and J. Ponce, Learning mid-level features for recognition, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5539963

O. Brouard, V. Ricordel, and D. Barba, Cartes de Saillance Spatio- Temporelle basées Contrastes de Couleur et Mouvement Relatif, Compression et representation des signaux audiovisuels, 2009.

A. Bur and H. Hügli, Optimal Cue Combination for Saliency Computation: A Comparison with Human Vision, Proceedings of the 2nd international work-conference on Nature Inspired Problem-Solving Methods in Knowledge Engineering: Interplay Between Natural and Artificial Computation, Part II, IWINAC '07, pp.109-118, 2007.
DOI : 10.1007/978-3-540-73055-2_13

S. J. Daly, Engineering Observations from Spatiovelocity and Spatiotemporal Visual Models, IS&T/SPIE Conference on Human Vision and Electronic Imaging III, 1998.
DOI : 10.1007/978-1-4757-3411-9_9

]. M. Dorr, T. Martinetz, K. R. Gegenfurtner, and E. Barth, Visual search for objects in a complex visual context: what we wish to see 39 [17 Variability of eye movements when viewing dynamic natural scenes, Journal of vision, issue.10, p.10, 2010.

A. Fathi, Y. Li, and J. Rehg, Learning to Recognize Daily Actions Using Gaze, Computer Vision ECCV 2012, pp.314-327, 2012.
DOI : 10.1007/978-3-642-33718-5_23

C. Rafael, R. E. Gonzalez, and . Woods, Digital Image Processing, 2001.

K. Grauman and T. Darrell, The pyramid match kernel: discriminative classification with sets of image features, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, 2005.
DOI : 10.1109/ICCV.2005.239

D. C. Hood and M. A. Finkelstein, Sensitivity to light, Handbook of perception and human performance, pp.5-6, 1986.

B. Ionescu, C. Vertan, P. Lambert, and A. Benoit, A color-action perceptual approach to the classification of animated movies, Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ICMR '11, pp.1-10, 2011.
DOI : 10.1145/1991996.1992006

URL : https://hal.archives-ouvertes.fr/hal-00623707

L. Itti, Quantifying the contribution of low-level saliency to human eye movements in dynamic scenes, Visual Cognition, vol.26, issue.6, pp.1093-1123, 2005.
DOI : 10.1038/23936

L. Itti and C. Koch, Computational modelling of visual attention, Nature Reviews Neuroscience, vol.2, issue.3, pp.194-203, 2001.
DOI : 10.1038/35058500

F. Jing, M. Li, H. J. Zhang, and B. Zhang, An effective region-based image retrieval framework, ACM International conference on Multimedia, 2002.

S. Karaman, J. Benois-pineau, R. Mégret, and A. Bugeau, Multi-layer Local Graph Words for Object Recognition, Advances in Multimedia Modeling, 2012.
DOI : 10.1016/j.cviu.2007.09.014

URL : https://hal.archives-ouvertes.fr/hal-00637120

P. Kraemer, J. Benois-pineau, and J. Domenger, Scene Similarity Measure for Video Content Segmentation in the Framework of Rough Book title goes here Indexing Paradigm, 2nd International Workshop on Adaptive Multimedia Retrieval, 2004.

M. Land, N. Mennie, and J. Rusted, The Roles of Vision and Eye Movements in the Control of Activities of Daily Living, Perception, vol.55, issue.4, pp.1311-1328, 1999.
DOI : 10.1007/978-1-4899-5379-7

I. Laptev, On space-time interest points, International Journal on Computer Vision, vol.2, pp.107-123, 2005.

S. Lazebnik, C. Schmid, and J. Ponce, Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), pp.2169-2178, 2006.
DOI : 10.1109/CVPR.2006.68

URL : https://hal.archives-ouvertes.fr/inria-00548585

O. , L. Meur, P. L. Callet, and D. Barba, Predicting visual fixations on video based on low-level video features, Vision Research, vol.47, pp.1057-1092, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00287424

Y. J. Lee, J. Ghosh, and K. Grauman, Discovering important people and objects for egocentric video summarization, IEEE Conference on Computer Vision and Pattern Recognition, 2012.

F. Long, H. Zhang, and D. D. Feng, Fundamentals of Content-Based Image Retrieval, Multimedia Information Retrieval and Management, 2003.
DOI : 10.1007/978-3-662-05300-3_1

D. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, vol.60, issue.2, pp.91-110, 2004.
DOI : 10.1023/B:VISI.0000029664.99615.94

J. Mairal, F. Bach, J. Ponce, and G. Sapiro, Online learning for matrix factorization and sparse coding, Journal of Machine Learning Research, vol.11, pp.19-60, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00408716

B. S. Manjunath, J. R. Ohm, V. V. Vasudevan, and A. Yamada, Colour and texture descriptors, IEEE Transactions on Circuits and Systems for Video Technology, vol.11, issue.6, p.703715, 2001.

S. Marat, T. Ho-phuoc, L. Granjon, N. Guyader, D. Pellerin et al., Modelling Spatio-Temporal Saliency to Predict Gaze Direction for??Short Videos, International Journal of Computer Vision, vol.15, issue.3, pp.231-243, 2009.
DOI : 10.1007/s11263-009-0215-3

URL : https://hal.archives-ouvertes.fr/hal-00368496

F. Mokhtarian and R. Suomela, Robust image corner detection through curvature scale space, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.20, issue.12, pp.1376-1381, 1998.
DOI : 10.1109/34.735812

D. Nister and H. Stewenius, Scalable Recognition with a Vocabulary Tree, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), pp.1-6, 2006.
DOI : 10.1109/CVPR.2006.264

H. Pirsiavash and D. Ramanan, Detecting activities of daily living in first-person camera views, 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.2847-2854, 2012.
DOI : 10.1109/CVPR.2012.6248010

M. Pomplun, H. Ritter, and B. Velichkovsky, Disambiguating Complex Visual Information: Towards Communication of Personal Views of a Scene, Perception, vol.13, issue.8, pp.931-948, 1995.
DOI : 10.1007/978-1-4899-5379-7

X. Ren and M. Philipose, Egocentric recognition of handled objects: Benchmark and analysis, Computer Vision and Pattern Recognition Workshop, 2009.

H. Sahbi, J. Y. Audibert, J. Rabarisoa, and R. Keriven, Robust matching and recognition using context-dependent kernels, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008.
DOI : 10.1145/1390156.1390264

URL : https://hal.archives-ouvertes.fr/hal-00834980

C. Schuldt, I. Laptev, and B. Caputo, Recognizing human actions: a local SVM approach, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., pp.32-36, 2004.
DOI : 10.1109/ICPR.2004.1334462

J. Sivic and A. Zisserman, Video Google: a text retrieval approach to object matching in videos, Proceedings Ninth IEEE International Conference on Computer Vision, 2003.
DOI : 10.1109/ICCV.2003.1238663

T. Starner, B. Schiele, and A. Pentland, Visual contextual awareness in wearable computing, Digest of Papers. Second International Symposium on Wearable Computers (Cat. No.98EX215), 1998.
DOI : 10.1109/ISWC.1998.729529

M. J. Swain and D. H. Ballard, Color indexing, International Journal of Computer Vision, vol.31, issue.1, pp.11-32, 1991.
DOI : 10.1007/BF00130487

B. W. Tatler, The central fixation bias in scene viewing: Selecting an optimal viewing position independently of motor biases and image feature distributions, Journal of Vision, vol.7, issue.14, pp.1-17, 2007.
DOI : 10.1167/7.14.4

A. M. Treisman and G. Gelade, A feature-integration theory of attention, Cognitive Psychology, vol.12, issue.1, pp.97-136, 1980.
DOI : 10.1016/0010-0285(80)90005-5

J. Van-gemert, C. Veenman, A. Smeulders, and J. Geusebroek, Visual Word Ambiguity, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.7, pp.1271-1283, 2010.
DOI : 10.1109/TPAMI.2009.132

E. Vig, M. Dorr, and D. Cox, Space-Variant Descriptor Sampling for Action Recognition Based on Saliency and Eye Movements, European conference on Computer Vision, 2012.
DOI : 10.1007/978-3-642-33786-4_7

D. Wooding, Eye movements of large populations: II. Deriving regions of interest, coverage, and similarity using fixation maps, Behavior Research Methods, Instruments, & Computers, vol.18, issue.4, pp.518-528, 2002.
DOI : 10.3758/BF03195481

J. Yang, K. Yu, Y. Gong, and T. Huang, Linear spatial pyramid matching using sparse coding for image classification, IEEE Conference on Computer Vision and Pattern Recognition, 2009.