M. Everingham, L. Van-gool, C. K. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) Challenge, International Journal of Computer Vision, vol.73, issue.2, 2007.
DOI : 10.1007/s11263-009-0275-4

K. Chatfield, V. Lempitsky, A. Vedaldi, and A. Zisserman, The devil is in the details: an evaluation of recent feature encoding methods, Procedings of the British Machine Vision Conference 2011, pp.1-12, 2011.
DOI : 10.5244/C.25.76

H. Wang, A. Klaser, C. Schmid, and C. Liu, Action recognition by dense trajectories, CVPR 2011, pp.3169-3176, 2011.
DOI : 10.1109/CVPR.2011.5995407
URL : https://hal.archives-ouvertes.fr/inria-00583818

I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, Learning realistic human actions from movies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2008.
DOI : 10.1109/CVPR.2008.4587756
URL : https://hal.archives-ouvertes.fr/inria-00548659

C. Schuldt, I. Laptev, and B. Caputo, Recognizing human actions: a local SVM approach, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., pp.32-36, 2004.
DOI : 10.1109/ICPR.2004.1334462

D. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, vol.60, issue.2, pp.91-110, 2004.
DOI : 10.1023/B:VISI.0000029664.99615.94

H. Bay, T. Tuytelaars, and L. Van-gool, Surf: Speeded up robust features, ECCV, pp.404-417, 2006.

N. Dalal and B. , Triggs, Histograms of oriented gradients for human detection, Conference on CVPR, pp.886-893, 2005.

N. Dalal, B. Triggs, and C. Schmid, Human Detection Using Oriented Histograms of Flow and Appearance, ECCV, vol.38, issue.1, pp.428-441, 2006.
DOI : 10.1023/A:1008162616689
URL : https://hal.archives-ouvertes.fr/inria-00548587

K. Mikolajczyk and C. Schmid, A performance evaluation of local descriptors, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.27, issue.10, pp.1615-1630, 2005.
DOI : 10.1109/TPAMI.2005.188
URL : https://hal.archives-ouvertes.fr/inria-00548227

E. Tola, V. Lepetit, and P. Fua, DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.5, pp.815-830, 2010.
DOI : 10.1109/TPAMI.2009.77

J. Davis and A. Bobick, The representation and recognition of action using temporal templates, Conference on CVPR, pp.928-934, 1997.

V. Kellokumpu, G. Zhao, and M. Pietikäinen, Texture Based Description of Movements for Activity Analysis, pp.206-213, 2008.

T. Ojala, M. Pietikäinen, and T. Mäenpää, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.24, issue.7, pp.971-987, 2002.
DOI : 10.1109/TPAMI.2002.1017623

V. Kellokumpu, G. Zhao, and M. Pietikäinen, Human activity recognition using a dynamic texture based method, pp.885-894, 2008.

L. Wang and D. Suter, Learning and Matching of Dynamic Shape Manifolds for Human Action Recognition, IEEE Transactions on Image Processing, vol.16, issue.6, p.1646, 2007.
DOI : 10.1109/TIP.2007.896661

M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, Actions as space-time shapes, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, pp.1395-1402, 2005.
DOI : 10.1109/ICCV.2005.28

L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri, Actions as Space-Time Shapes, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.29, issue.12, pp.2247-2253, 2007.
DOI : 10.1109/TPAMI.2007.70711

J. Liu, J. Luo, and M. Shah, Recognizing realistic actions from videos in the wild, Conference on CVPR, pp.1996-2003, 2009.

R. Polana and R. Nelson, Low level recognition of human motion, Proc. IEEE Workshop on Nonrigid and Articulate Motion, pp.77-82, 1994.

A. Efros, A. Berg, G. Mori, and J. Malik, Recognizing action at a distance, Proceedings Ninth IEEE International Conference on Computer Vision, pp.726-733, 2003.
DOI : 10.1109/ICCV.2003.1238420

B. D. Lucas and T. Kanade, An iterative image registration technique with an application to stereo vision, Proceedings of the 7th international joint conference on Artificial intelligence, pp.674-679, 1981.

A. Fathi and G. Mori, Action recognition by learning mid-level motion features, 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2008.
DOI : 10.1109/CVPR.2008.4587735
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.150.8599

S. Danafar and N. Gheissari, Action Recognition for Surveillance Applications Using Optic Flow and SVM, pp.457-466, 2007.
DOI : 10.1007/978-3-540-76390-1_45

D. Tran and A. Sorokin, Human Activity Recognition with Metric Learning, ECCV, pp.548-561, 2008.
DOI : 10.1007/978-3-540-88682-2_42

S. Ali and M. Shah, Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, issue.2, pp.288-303, 2010.
DOI : 10.1109/TPAMI.2008.284

P. Dollár, V. Rabaud, G. Cottrell, and S. Belongie, Behavior recognition via Table 10: Mean Average Precision on the UCF11 dataset ; ND: number of descriptors ; NL: non-linear classifiers ; In [52] HOG/HOF descriptors are accumulated on over 100 spatio-temporal regions each one leading to a different BoW signature sparse spatio-temporal features, 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp.65-72, 2005.

A. Klaser, M. Marszalek, and C. Schmid, A Spatio-Temporal Descriptor Based on 3D-Gradients, Procedings of the British Machine Vision Conference 2008, 2008.
DOI : 10.5244/C.22.99
URL : https://hal.archives-ouvertes.fr/inria-00514853

P. Scovanner, S. Ali, and M. Shah, A 3-dimensional sift descriptor and its application to action recognition, Proceedings of the 15th international conference on Multimedia , MULTIMEDIA '07, pp.357-360, 2007.
DOI : 10.1145/1291233.1291311

G. Willems, T. Tuytelaars, and L. , Van Gool, An efficient dense and scaleinvariant spatio-temporal interest point detector, ECCV, pp.650-663, 2008.
DOI : 10.1007/978-3-540-88688-4_48

O. Kihl, B. Tremblais, B. Augereau, and M. Khoudeir, Human activities discrimination with motion approximation in polynomial bases, 2010 IEEE International Conference on Image Processing, pp.2469-2472, 2010.
DOI : 10.1109/ICIP.2010.5651327
URL : https://hal.archives-ouvertes.fr/hal-00594762

V. F. Mota, E. Perez, M. B. Vieira, L. Maciel, F. Precioso et al., A Tensor Based on Optical Flow for Global Description of Motion in Videos, 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images, pp.2012-298
DOI : 10.1109/SIBGRAPI.2012.48
URL : https://hal.archives-ouvertes.fr/hal-00753159

O. Kihl, D. Picard, and P. Gosselin, Local polynomial space???time descriptors for action classification, Machine Vision and Applications, vol.2010, issue.6, 2013.
DOI : 10.1007/s00138-014-0652-z
URL : https://hal.archives-ouvertes.fr/hal-01097536

J. Sánchez, F. Perronnin, and T. D. Campos, Modeling the spatial layout of images beyond spatial pyramids, Pattern Recognition Letters, vol.33, issue.16
DOI : 10.1016/j.patrec.2012.07.019

H. Wang, M. M. Ullah, A. Klaser, I. Laptev, and C. Schmid, Evaluation of local spatio-temporal features for action recognition, Procedings of the British Machine Vision Conference 2009, 2009.
DOI : 10.5244/C.23.124
URL : https://hal.archives-ouvertes.fr/inria-00439769

J. Sivic and A. Zisserman, Video Google: a text retrieval approach to object matching in videos, Proceedings Ninth IEEE International Conference on Computer Vision, pp.1470-1477, 2003.
DOI : 10.1109/ICCV.2003.1238663

J. Wang, J. Yang, K. Yu, F. Lv, T. Huang et al., Locality-constrained Linear Coding for image classification, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.3360-3367, 2010.
DOI : 10.1109/CVPR.2010.5540018

J. Yang, K. Yu, Y. Gong, and T. Huang, Linear spatial pyramid matching using sparse coding for image classification, Conference on CVPR, pp.1794-1801, 2009.

S. Avila, N. Thome, M. Cord, E. Valle, A. De et al., BOSSA: Extended bow formalism for image classification, 2011 18th IEEE International Conference on Image Processing, pp.2909-2912, 2011.
DOI : 10.1109/ICIP.2011.6116268
URL : https://hal.archives-ouvertes.fr/hal-00625533

H. Jégou, M. Douze, C. Schmid, and P. Pérez, Aggregating local descriptors into a compact image representation, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.3304-3311, 2010.
DOI : 10.1109/CVPR.2010.5540039

X. Zhou, K. Yu, T. Zhang, and T. Huang, Image classification using supervector coding of local image descriptors, 2010.

H. Jégou, F. Perronnin, M. Douze, J. Sánchez, P. Pérez et al., Aggregating Local Image Descriptors into Compact Codes, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.34, issue.9, pp.1704-1716, 2012.
DOI : 10.1109/TPAMI.2011.235

D. Picard and P. Gosselin, Efficient image signatures and similarities using tensor products of local descriptors, Computer Vision and Image Understanding, vol.117, issue.6, pp.680-687, 2013.
DOI : 10.1016/j.cviu.2013.02.004
URL : https://hal.archives-ouvertes.fr/hal-00799074

D. Picard and P. Gosselin, Improving image similarity with vectors of locally aggregated tensors, 2011 18th IEEE International Conference on Image Processing, pp.669-672, 2011.
DOI : 10.1109/ICIP.2011.6116641
URL : https://hal.archives-ouvertes.fr/hal-00591993

R. Negrel, D. Picard, and P. Gosselin, Using spatial pyramids with compacted vlat for image categorization, pp.2460-2463, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00753158

M. Varma and A. Zisserman, A Statistical Approach to Material Classification Using Image Patch Exemplars, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.31, issue.11, pp.2032-2047, 2009.
DOI : 10.1109/TPAMI.2008.182

R. Negrel, D. Picard, and P. Gosselin, Compact tensor based image representation for similarity search, 2012 19th IEEE International Conference on Image Processing, pp.2425-2428, 2012.
DOI : 10.1109/ICIP.2012.6467387
URL : https://hal.archives-ouvertes.fr/hal-00753157

S. Lazebnik, C. Schmid, and J. Ponce, Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), pp.2169-2178, 2006.
DOI : 10.1109/CVPR.2006.68
URL : https://hal.archives-ouvertes.fr/inria-00548585

L. Li, H. Su, E. P. Xing, and L. Fei-fei, Object bank: A high-level image representation for scene classification and semantic feature sparsification, Advances in Neural Information Processing Systems 24

B. Horn and B. Schunck, Determining optical flow, Artificial Intelligence, vol.17, issue.1-3, pp.185-203, 1981.
DOI : 10.1016/0004-3702(81)90024-2

A. Gilbert, J. Illingworth, and R. Bowden, Action Recognition Using Mined Hierarchical Compound Features, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.33, issue.5, pp.883-897, 2011.
DOI : 10.1109/TPAMI.2010.144
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.301.1835

M. Ullah, S. Parizi, and I. Laptev, Improving bag-of-features action recognition with non-local cues, Procedings of the British Machine Vision Conference 2010, 2010.
DOI : 10.5244/C.24.95

C. Farabet, C. Couprie, L. Najman, and Y. Lecun, Learning Hierarchical Features for Scene Labeling, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, issue.8, pp.1915-1929, 2013.
DOI : 10.1109/TPAMI.2012.231
URL : https://hal.archives-ouvertes.fr/hal-00742077

A. Rakotomamonjy, R. Flamary, and F. Yger, Learning with infinitely many features, Machine Learning, vol.44, issue.7, pp.43-6610, 2013.
DOI : 10.1007/s10994-012-5324-5
URL : https://hal.archives-ouvertes.fr/hal-00880945