A. Strat-2013a-sabin-tiberius-strat, P. Benoit, A. Lambert, and . Caplier, Retina enhanced SURF descriptors for spatio-temporal concept detection, Multimedia Tools and Applications, pp.1-27, 2013.

T. @bullet-sabin, A. Strat, P. Benoit, H. Lambert, G. Bredin et al., Hierarchical late fusion for concept detection in videos (extended book chapter, 2012.

S. T. Strat-2012a, A. Strat, P. Benoit, A. Lambert, and . Caplier, Retina-enhanced SURF descriptors for semantic concept detection in videos, 2012 3rd International Conference on Image Processing Theory, Tools and Applications (IPTA), pp.319-324
DOI : 10.1109/IPTA.2012.6469557

T. Strat, A. Strat, H. Benoit, G. Bredin, P. Quénot et al., Hierarchical Late Fusion for Concept Detection in Videos, Proceedings of European Conference of Computer Vision -ECCV 2012 Oral session 1: WS21 -Workshop on Information Fusion in Computer Vision for Concept Recognition OSEO (French State agency for innovation) and ANR (French national research agency), pp.335-344, 2012.
DOI : 10.1007/978-3-642-33885-4_34

URL : https://hal.archives-ouvertes.fr/hal-00981688

S. T. Strat-2013b, A. Strat, P. Benoit, and . Lambert, Retina enhanced SIFT descriptors for video indexing, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI), pp.201-206, 2013.
DOI : 10.1109/CBMI.2013.6576582

@. Irim and T. , Semantic Indexing and Instance Search, 2012 TREC Video Retrieval Evaluation Notebook Papers and Slides Bibliography Alexandre Alahi, Raphaël Ortiz and Pierre Vandergheynst. FREAK: Fast Retina Keypoint, IEEE Conference on Computer Vision and Pattern Recognition , 2012. CVPR 2012 Open Source Award Winner, pp.36-121, 2012.

A. Ali, E. Debreuve, P. Kornprobst, and M. Barlaud, Bioinspired Bags-of-features for Image Classification, In KDIR, pp.277-281, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00845745

A. , D. Arthur, and S. Vassilvitskii, k-means++: the advantages of careful seeding, SODA, pp.1027-1035, 2007.

G. Ayache-2007-]-stéphane-ayache, J. Quénot, and . Gensel, Image and Video Indexing Using Networks of Operators, EURASIP Journal on Image and Video Processing, vol.5, issue.1, pp.1-1, 2007.
DOI : 10.1109/MMUL.2006.63

N. Ballas, B. Delezoide, and F. Prêteux, Trajectories based descriptor for dynamic events annotation, Proceedings of the 2011 joint ACM workshop on Modeling and representing events, J-MRE '11, pp.13-18, 2011.
DOI : 10.1145/2072508.2072512

]. Ballas, B. Labbé, A. Shabou, and L. Borgne, CEA LIST at TRECVID 2012: Semantic Indexing and Instance Search, Proc. TRECVID Workshop, p.26, 2012.

N. Ballas, B. Labbé, A. Shabou, H. Le-borgne, P. Gosselin et al., IRIM at TRECVID 2012: Semantic Indexing and Instance Search CNRS, RENATER, several Universities, other funding bodies (see https, Proceedings of the workshop on TREC Video Retrieval Evaluation (TRECVID), page 12p, pp.13-15, 2012.

A. Chen and . Hauptmann, MoSIFT: Recognizing Human Actions in Surveillance Videos, pp.25-45, 2009.

V. Cliville, L. Berrah, and G. Mauris, Information fusion in industrial performance a 2-additive choquet-integral based approach, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583), pp.1297-1302, 2004.
DOI : 10.1109/ICSMC.2004.1399804

C. R. Gabriella-csurka, L. Dance, J. Fan, C. Willamowski, and . Bray, Visual categorization with bags of keypoints, Workshop on Statistical Learning in Computer Vision, ECCV, pp.1-22, 2004.

]. S. Daly, A visual model for optimizing the design of image processing algorithms, Proceedings of 1st International Conference on Image Processing, pp.16-20, 1994.
DOI : 10.1109/ICIP.1994.413522

]. R. De-carvalho-soares, I. R. Da-silva, and D. Guliato, Spatial Locality Weighting of Features Using Saliency Map with a Bag-of-Visual-Words Approach, 2012 IEEE 24th International Conference on Tools with Artificial Intelligence, pp.1070-1075, 2012.
DOI : 10.1109/ICTAI.2012.151

B. Delezoide, F. Precioso, P. Gosselin, M. Redi, B. Mérialdo et al., IRIM at TRECVID 2011: Semantic indexing and instance search, TRECVID 2011, 15th International Workshop on Video Retrieval Evaluation, pp.2011-80, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00677651

]. J. Deng, W. Dong, R. Socher, L. Li, K. Li et al., ImageNet: A Large-Scale Hierarchical Image Database, CVPR09, 2009.

]. Dollár, V. Rabaud, G. Cottrell, and S. Belongie, Behavior Recognition via Sparse Spatio-Temporal Features, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp.65-72, 2005.
DOI : 10.1109/VSPETS.2005.1570899

M. Everingham, L. Van-gool, C. K. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, pp.303-338, 2010.
DOI : 10.1007/s11263-009-0275-4

]. Everingham, L. Gool, C. K. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, pp.303-338, 2010.
DOI : 10.1007/s11263-009-0275-4

R. Ewerth, M. Schwalb, P. Tessmann, and B. Freisleben, Estimation of arbitrary camera motion in MPEG videos, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., pp.512-515, 2004.
DOI : 10.1109/ICPR.2004.1334181

R. Fei-fei-li-fei-fei, P. Fergus, and . Perona, Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories, 2004 Conference on Computer Vision and Pattern Recognition Workshop, pp.59-70, 2007.
DOI : 10.1109/CVPR.2004.383

A. Martin, R. C. Fischler, and . Bolles, Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography, Commun. ACM, vol.24, issue.6, pp.381-395, 1981.

J. David, Y. Fleet, and . Weiss, Optical Flow Estimation, 2005.

R. E. Freund and . Schapire, A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting, Journal of Computer and System Sciences, vol.55, issue.1, pp.119-139, 1997.
DOI : 10.1006/jcss.1997.1504

]. A. Gaidon, Z. Harchaoui, and C. Schmid, Actom sequence models for efficient action detection, CVPR 2011, pp.3201-3208, 2011.
DOI : 10.1109/CVPR.2011.5995646

URL : https://hal.archives-ouvertes.fr/inria-00575217

V. Iván-gonzález-díaz, J. Buso, G. Benois-pineau, R. Bourmaud, and . Megret, Modeling Instrumental Activities of Daily Living in Egocentric Vision As Sequences of Active Objects and Context for Alzheimer Disease Research, Proceedings of the 1st ACM International Workshop on Multimedia Indexing and Information Retrieval for Healthcare, MIIRH '13, pp.11-14, 2013.

D. Gorisse, F. Precioso, P. Gosselin, L. Granjon, D. Pellerin et al., IRIM at TRECVID 2010: Semantic Indexing and Instance Search, TREC online proceedings, pp.51-53, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00591099

]. G. Griffin, A. Holub, and P. Perona, Caltech-256 Object Category Dataset, 2007.

]. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann et al., The WEKA data mining software, ACM SIGKDD Explorations Newsletter, vol.11, issue.1, pp.10-18, 2009.
DOI : 10.1145/1656274.1656278

]. L. Itti, C. Koch, and E. Niebur, A model of saliency-based visual attention for rapid scene analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.20, issue.11, pp.1254-1259, 1998.
DOI : 10.1109/34.730558

]. D. Jobson, Z. Rahman, and G. A. , A multiscale retinex for bridging the gap between color images and the human observation of scenes, IEEE Transactions on Image Processing, vol.6, issue.7, pp.965-976, 1997.
DOI : 10.1109/83.597272

J. Luo and O. Gwun, A Comparison of SIFT , PCA-SIFT and SURF, International Journal of Image Processing IJIP, vol.3, issue.4, pp.143-152, 2009.

S. Little, I. Jargalsaikhan, C. Direkoglu, N. E. O-'connor, A. F. Smeaton et al., Interactive Surveillance Event Detection, TRECVid Workshop, p.2012, 2012.

]. L. Liu, L. Wang, and X. Liu, In defense of soft-assignment coding, Computer Vision (ICCV), 2011 IEEE International Conference on, pp.2486-2493, 2011.

. Lowe, G. David, and . Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, vol.60, issue.2, pp.91-110, 2004.
DOI : 10.1023/B:VISI.0000029664.99615.94

G. David and . Lowe, Distinctive Image Features from Scale-Invariant Keypoints, Int. J. Comput. Vision, vol.60, issue.2, pp.91-110, 2004.

S. Mantiuk, K. Daly, H. Myszkowski, and . Seidel, Predicting visible differences in high dynamic range images: model and its calibration, Human Vision and Electronic Imaging X, pp.204-214, 2005.
DOI : 10.1117/12.586757

S. Marat, T. Ho-phuoc, L. Granjon, and N. Guyader, Denis Pellerin and Anne Guérin-Dugué. Spatio-temporal saliency model to predict eye movements in video free viewing, Proceedings of the 16th European Signal Processing Conference Département Images et Signal Département Images et Signal, pp.1-5, 2008.

]. M. Marszalek, I. Laptev, and C. Schmid, Actions in context, 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp.2929-2936, 2009.
DOI : 10.1109/CVPR.2009.5206557

URL : https://hal.archives-ouvertes.fr/inria-00548645

D. Franck-moosmann, F. Larlus, and . Jurie, Learning saliency maps for object categorization In International Workshop on The Representation and Use of Prior Knowledge in Vision, ECCV '06), 2006.

M. Muja and D. G. Lowe, Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration, International Conference on Computer Vision Theory and Application VISSAPP'09), pp.331-340, 2009.

]. R. Negrel, D. Picard, and P. Gosselin, Compact tensor based image representation for similarity search, 2012 19th IEEE International Conference on Image Processing, pp.2425-2428, 2012.
DOI : 10.1109/ICIP.2012.6467387

URL : https://hal.archives-ouvertes.fr/hal-00753157

B. Kwong, P. B. Ng, and . Kantor, Predicting the Effectiveness of Naive Data Fusion on the Basis of System Characteristics, Journal of the American Society for Information Science, vol.51, pp.1177-1189, 2000.

J. Carlos-niebles, H. Wang, and L. Fei-fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, International Journal of Computer Vision, vol.25, issue.25, pp.299-318, 2008.
DOI : 10.1007/s11263-007-0122-4

E. Nowak, F. Jurie, and B. Triggs, Sampling strategies for bagof-features image classification, Proceedings of the 9th European conference on Computer Vision -Volume Part IV, ECCV'06, pp.490-503, 2006.

]. Ojala, M. Pietikäinen, and D. Harwood, A comparative study of texture measures with classification based on featured distributions, Pattern Recognition, vol.29, issue.1, pp.51-59, 1996.
DOI : 10.1016/0031-3203(95)00067-4

]. Ortiz, FREAK: Fast Retina Keypoint, Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), CVPR '12, pp.510-517, 2012.

P. Over, G. Awad, M. Michel, J. Fiscus, W. Kraaij et al., TRECVID 2011 ? An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics, Proceedings of TRECVID 2011, p.2011, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00763912

P. Over, G. Awad, M. Michel, J. Fiscus, G. Sanders et al., TRECVID 2012 ? An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics, Proceedings of TRECVID 2012. NIST, pp.10-30, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00953826

]. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek, Image Classification with the Fisher Vector: Theory and Practice, International Journal of Computer Vision, vol.73, issue.2, pp.222-245, 2013.
DOI : 10.1007/s11263-013-0636-x

S. Savarese, J. Winn, and A. Criminisi, Discriminative Object Class Models of Appearance and Shape by Correlatons, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), pp.2033-2040, 2006.
DOI : 10.1109/CVPR.2006.102

I. Schuldt, B. Laptev, and . Caputo, Recognizing human actions: a local SVM approach, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., pp.32-36, 2004.
DOI : 10.1109/ICPR.2004.1334462

]. H. Senane, A. Saadane, and D. Barba, Design and Evaluation of an Entirely Psychovisual-Based Coding Scheme, Journal of Visual Communication and Image Representation, vol.12, issue.4, pp.401-421, 2001.
DOI : 10.1006/jvci.2001.0489

]. Shi and C. Tomasi, Good Features to Track, 1994 IEEE Conference on Computer Vision and Pattern Recognition (CVPR'94), pp.593-600, 1994.

]. Shi and C. Tomasi, Good Features to Track, pp.593-600, 1994.

A. F. Smeaton, P. Over, and A. R. Doherty, Video shot boundary detection: Seven years of TRECVid activity, Computer Vision and Image Understanding, vol.114, issue.4, pp.411-418, 2010.
DOI : 10.1016/j.cviu.2009.03.011

M. James, T. H. Sprague, and . Meikle-jr, The role of the superior colliculus in visually guided behavior, Experimental Neurology, vol.11, issue.1, pp.115-146, 1965.

]. S. Strat, A. Benoit, P. Lambert, and A. Caplier, Retina-enhanced SURF descriptors for semantic concept detection in videos, 2012 3rd International Conference on Image Processing Theory, Tools and Applications (IPTA), pp.319-324, 2012.
DOI : 10.1109/IPTA.2012.6469557

URL : https://hal.archives-ouvertes.fr/hal-00732736

]. Strat, A. Benoit, H. Bredin, G. Quenot, and P. Lambert, Hierarchical Late Fusion for Concept Detection in Videos, Proceedings of Computer Vision -ECCV 2012. Workshops and Demonstrations, Part III, pp.335-344, 2012.
DOI : 10.1007/978-3-642-33885-4_34

URL : https://hal.archives-ouvertes.fr/hal-00981688

S. Berlin, Oral session 1: WS21 -Workshop on Information Fusion in Computer Vision for Concept Recognition OSEO (French State agency for innovation) and ANR (French national research agency), pp.28-92

]. Strat, A. Benoit, P. Lambert, and A. Caplier, Retina enhanced SURF descriptors for spatio-temporal concept detection, Multimedia Tools and Applications, pp.1-27, 2013.
DOI : 10.1007/s11042-012-1280-0

URL : https://hal.archives-ouvertes.fr/hal-00760192

]. S. Strat, A. Benoit, and P. Lambert, Retina enhanced SIFT descriptors for video indexing, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI), pp.201-206, 2013.
DOI : 10.1109/CBMI.2013.6576582

URL : https://hal.archives-ouvertes.fr/hal-00875044

E. B. Sudderth, A. Torralba, W. T. Freeman, and A. S. Willsky, Learning hierarchical models of scenes, objects, and parts, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, pp.1331-1338, 2005.
DOI : 10.1109/ICCV.2005.137

C. Tanase and B. Merialdo, Introducing motion information in dense feature classifiers, 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), pp.1-4, 2013.
DOI : 10.1109/WIAMIS.2013.6616132

Z. Tang, K. Awad, R. Travis-rose, J. G. Fiscus, W. Kraaij et al., UEC at TRECVID 2008 High Level Feature Task, Paul Over, TRECVID. National Institute of Standards and Technology (NIST), pp.28-100, 2008.

]. K. Tran, I. A. Kakadiaris, and S. K. Shah, Part-based motion descriptor image for human action recognition, Pattern Recognition, vol.45, issue.7, pp.2562-2572, 2012.
DOI : 10.1016/j.patcog.2011.12.028

]. Turner, Texture discrimination by Gabor functions, Biol. Cybern, vol.55, issue.132, pp.71-82, 1986.