X. Alameda-pineda, J. Sanchez-riera, J. Wienke, V. Franc, J. Cech et al., RAVEL: an annotated corpus for training robots with audiovisual abilities, Journal on Multimodal User Interfaces, vol.24, issue.2, pp.79-91, 2013.
DOI : 10.1007/s12193-012-0111-y

URL : https://hal.archives-ouvertes.fr/hal-00720734

J. Alon, V. Athitsos, Q. Yuan, and S. Sclaroff, A Unified Framework for Gesture Recognition and Spatiotemporal Gesture Segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.31, issue.9, pp.1685-1699, 2009.
DOI : 10.1109/TPAMI.2008.203

J. Blackburn and E. Ribeiro, Human Motion Recognition Using Isomap and Dynamic Time Warping, pp.285-298, 2007.
DOI : 10.1007/978-3-540-75703-0_20

S. Boyd and L. Vandenberghe, Convex Optimization, 2004.

W. Brendel and S. Todorovic, Activities as Time Series of Human Postures, p.10, 2010.
DOI : 10.1007/978-3-642-15552-9_52

S. Chen, D. Donoho, and M. Saunders, Atomic Decomposition by Basis Pursuit, SIAM Review, vol.43, issue.1, pp.129-159, 2001.
DOI : 10.1137/S003614450037906X

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.113.7694

G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray, Visual categorization with bags of keypoints, ECCV Workshop on Statistical Learning in Computer Vision 16, 2004.

S. Escalera, J. Gonzàlez, X. Baró, M. Reyes, O. Lopes et al., Multimodal gesture recognition challenge 2013: Dataset and results, ChaLearn Multi-modal Gesture Recognition Grand Challenge and Workshop, 15th ACM International Conference on Multimodal Interaction 23, 2013.
DOI : 10.1145/2522848.2532595

URL : https://hal.archives-ouvertes.fr/hal-01381153

G. Evangelidis and E. Psarakis, Parametric Image Alignment Using Enhanced Correlation Coefficient Maximization, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.30, issue.10, pp.1858-1865, 2008.
DOI : 10.1109/TPAMI.2008.113

URL : https://hal.archives-ouvertes.fr/hal-00864385

M. Gales and S. Young, The Application of Hidden Markov Models in Speech Recognition, Foundations and Trends?? in Signal Processing, vol.1, issue.3, p.20, 2008.
DOI : 10.1561/2000000004

P. Gill, A. Wang, and A. Molnar, The In-Crowd Algorithm for Fast Basis Pursuit Denoising, IEEE Transactions on Signal Processing, vol.59, issue.10, pp.4595-4605, 2011.
DOI : 10.1109/TSP.2011.2161292

D. Gong and G. Medioni, Dynamic Manifold Warping for view invariant action recognition, 2011 International Conference on Computer Vision, pp.571-578, 2011.
DOI : 10.1109/ICCV.2011.6126290

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.369.7764

K. Kulkarni, G. Evangelidis, J. Cech, and &. , Horaud (a) Failure of DFW (Run) (b) Failure of DFW (Fight-Person)

. Fig, 13 This figure shows the test-frame-to-metaframe distance grids (warm colors correspond to large discrepancies) and alignment paths (white lines) for the Run and Fight-Person sequences of Hollywood-2. The top plots, a) and b), show the result of applying DFW In this case, the algorithm failed to properly align the test sequences with the corresponding class templates. One may notice that there is no obvious " low-cost " path visible on these grids, therefore DFW fails to find a good alignment and a satisfactory score. The bottom plots, c) and d), show the result of applying OP-DFW to exactly the same test sequences, where Run and Fight-Person actions are modeled by their constituting motion patterns. Hence, OP-DFW jumps from one motion pattern to the next one Examples of alignments with DFW and OP-DFW are shown on Fig

H. Hienz, B. Bauer, K. Kraiss, A. Braffort, R. Gherbi et al., HMM-Based Continuous Sign Language Recognition Using Stochastic Grammars, Lecture Notes in Computer Science, vol.1739, pp.185-196, 1999.
DOI : 10.1007/3-540-46616-9_17

M. Hoai, Z. Lan, and F. De-la-torre, Joint segmentation and classification of human actions in video, CVPR 2011, pp.18-22, 2011.
DOI : 10.1109/CVPR.2011.5995470

N. Ikizler and P. Duygulu, Histogram of oriented rectangles: A new pose descriptor for human action recognition, Image and Vision Computing, vol.27, issue.10, p.10, 2009.
DOI : 10.1016/j.imavis.2009.02.002

M. Jain, H. Jégou, and P. Bouthémy, Better Exploiting Motion for Better Action Recognition, 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp.2555-2562, 2013.
DOI : 10.1109/CVPR.2013.330

URL : https://hal.archives-ouvertes.fr/hal-00813014

Y. Jiang, Q. Dai, X. Xue, W. Liu, and C. Ngo, Trajectory-Based Modeling of Human Actions with Motion Reference Points, European Conference on Computer Vision, pp.425-438, 2012.
DOI : 10.1007/978-3-642-33715-4_31

K. Kulkarni, S. Cherla, A. Kale, and V. Ramasubramanian, A framework for indexing human actions in video, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00326719

. Fig, 14 Isolated recognition alignments using DFW and OP-DFW for periodic actions. The figure shows the alignments between a test video, (a) and (d), and metaframes from the training data for the Run and Fight-Person examples shown in Fig

I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, Learning realistic human actions from movies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp.20-23, 2008.
DOI : 10.1109/CVPR.2008.4587756

URL : https://hal.archives-ouvertes.fr/inria-00548659

C. Lee and L. Rabiner, A frame-synchronous network search algorithm for connected word recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.37, issue.11, pp.1649-1658, 1989.
DOI : 10.1109/29.46547

R. Liang and M. Ouhyoung, A real-time continuous gesture recognition system for sign language, Third IEEE International Conference on Automatic Face and Gesture Recognition, pp.558-567, 1998.

F. Lv and R. Nevatia, Recognition and Segmentation of 3-D Human Action Using HMM and Multi-class AdaBoost, European Conference on Computer Vision pp, pp.359-372, 2006.
DOI : 10.1007/11744085_28

F. Lv and R. Nevatia, Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007.
DOI : 10.1109/CVPR.2007.383131

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.309.9878

C. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval, 2008.
DOI : 10.1017/CBO9780511809071

M. Marszalek, I. Laptev, and C. Schmid, Actions in context, 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp.2929-2936, 2009.
DOI : 10.1109/CVPR.2009.5206557

URL : https://hal.archives-ouvertes.fr/inria-00548645

L. Morency, A. Quattoni, and T. Darrell, Latentdynamic discriminative models for continuous gesture recognition, In: Computer Vision and Pattern Recognition, pp.1-8, 2007.
DOI : 10.1109/cvpr.2007.383299

URL : http://dspace.mit.edu/bitstream/1721.1/35276/1/MIT-CSAIL-TR-2007-002.pdf

M. Mueller, Dynamic Time Warping, pp.69-84, 2007.
DOI : 10.1007/978-3-540-74048-3_4

H. Ney, The Use of a One-Stage Dynamic Programming Algorithm for Connected Word Recognition, IEEE Transactions on Acoustics, Speech and Signal Processing, vol.32, issue.2, pp.263-271, 1984.
DOI : 10.1016/B978-0-08-051584-7.50019-X

H. Ney and S. Ortmanns, Dynamic programming search for continuous speech recognition, IEEE Signal Processing Magazine, vol.16, issue.5, pp.64-83, 1999.
DOI : 10.1109/79.790984

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.42.4362

H. Ning, W. Xu, Y. Gong, and T. Huang, Latent Pose Estimator for Continuous Action Recognition, European Conference on Computer Vision, pp.419-433, 2008.
DOI : 10.1007/978-3-540-88688-4_31

L. Rabiner and B. Juang, Fundamentals of Speech Recognition, p.7, 1993.

H. Sakoe, Two-level DP-matching--A dynamic programming-based pattern matching algorithm for connected word recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.27, issue.6, pp.588-595, 1979.
DOI : 10.1109/TASSP.1979.1163310

J. Sanchez-riera, J. Cech, and R. Horaud, Action Recognition Robust to Background Clutter by Using Stereo Vision, LNCS, vol.21, p.23, 2012.
DOI : 10.1007/978-3-642-33863-2_33

URL : https://hal.archives-ouvertes.fr/hal-00768670

Q. Shi, L. Wang, L. Cheng, and A. Smola, Human Action Segmentation and Recognition Using Discriminative Semi-Markov Models, International Journal of Computer Vision, vol.6, issue.4???5, pp.22-32, 2011.
DOI : 10.1007/s11263-010-0384-0

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.169.7159

L. Sigal, A. Balan, and M. Black, HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human??Motion, International Journal of Computer Vision, vol.74, issue.3, pp.4-27, 2010.
DOI : 10.1007/s11263-009-0273-6

J. Sivic and A. Zisserman, Efficient Visual Search of Videos Cast as Text Retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.31, issue.4, pp.591-606, 2009.
DOI : 10.1109/TPAMI.2008.111

C. Sminchisescu, A. Kanaujia, and D. Metaxas, Conditional models for contextual human motion recognition, CVIU, vol.104, issue.2-3, pp.210-220, 2006.

B. Solmaz, S. Assari, and M. Shah, Classifying web videos using a global video descriptor, Machine vision and applications, pp.1473-1485, 2013.
DOI : 10.1007/s00138-012-0449-x

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.297.4388

T. Starner, J. Weaver, and A. Pentland, Real-time American sign language recognition using desk and wearable computer based video, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.20, issue.12, pp.1371-1375, 1998.
DOI : 10.1109/34.735811

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.125.8443

J. Tropp and A. Gilbert, Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit, IEEE Transactions on Information Theory, vol.53, issue.12, pp.4655-4666, 2007.
DOI : 10.1109/TIT.2007.909108

URL : http://authors.library.caltech.edu/9490/1/TROieeetit07.pdf

M. Ullah, S. Parizi, and I. Laptev, Improving bagof-features action recognition with non-local cues, British Machine Vision Conference 21, p.23, 2010.
DOI : 10.5244/c.24.95

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.174.6987

D. Vail, M. Veloso, and J. Lafferty, Conditional random fields for activity recognition, Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems , AAMAS '07, pp.235-240, 2007.
DOI : 10.1145/1329125.1329409

T. Vintsyuk, Element-wise recognition of continuous speech composed of words from a specified dictionary, Cybernetics and Systems Analysis, vol.7, issue.2, pp.361-372, 1971.

C. Vogler and D. Metaxas, ASL recognition based on a coupling between HMMs and 3D motion analysis, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271), pp.363-369, 1998.
DOI : 10.1109/ICCV.1998.710744

C. Vogler and D. Metaxas, A Framework for Recognizing the Simultaneous Aspects of American Sign Language, Computer Vision and Image Understanding, vol.81, issue.3, pp.358-384, 2001.
DOI : 10.1006/cviu.2000.0895

H. Wang and C. Schmid, Action Recognition with Improved Trajectories, 2013 IEEE International Conference on Computer Vision, p.23, 2013.
DOI : 10.1109/ICCV.2013.441

URL : https://hal.archives-ouvertes.fr/hal-00873267

S. Young, N. Russell, and J. Thornton, Token passing: a simple conceptual model for connected speech recognition systems, Tech. Rep, vol.38, 1989.

S. Young, P. Woodland, and W. Byrne, HTK: Hidden Markov model toolkit v1. 5. Tech. rep, 1993.

S. Young, G. Evermann, D. Kershaw, G. Moore, J. Odell et al., The HTK book, p.20, 2009.

F. Zhou and F. La-torre, Canonical time warping for alignment of human behavior, 2009.