M. R. Amer, S. Todorovic, A. Fern, and S. C. Zhu, Monte Carlo Tree Search for Scheduling Activity Recognition, 2013 IEEE International Conference on Computer Vision, p.ICCV, 2013.
DOI : 10.1109/ICCV.2013.171
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.405.5916

F. Bach and Z. Harchaoui, DIFFRAC: a discriminative and flexible framework for clustering, p.NIPS, 2007.

D. Bertsekas, Nonlinear Programming, Athena Scientific, 1999.

P. Bojanowski, F. Bach, I. Laptev, J. Ponce, C. Schmid et al., Finding Actors and Actions in Movies, 2013 IEEE International Conference on Computer Vision, p.ICCV, 2013.
DOI : 10.1109/ICCV.2013.283
URL : https://hal.archives-ouvertes.fr/hal-00904991

P. Bojanowski, R. Lajugie, F. Bach, I. Laptev, J. Ponce et al., Weakly Supervised Action Labeling in Videos under Ordering Constraints, p.arXiv, 2014.
DOI : 10.1007/978-3-319-10602-1_41
URL : https://hal.archives-ouvertes.fr/hal-01053967

O. Duchenne, I. Laptev, J. Sivic, F. Bach, and J. Ponce, Automatic annotation of human actions in video, 2009 IEEE 12th International Conference on Computer Vision, p.ICCV, 2009.
DOI : 10.1109/ICCV.2009.5459279

M. Frank and P. Wolfe, An algorithm for quadratic programming, Naval Research Logistics Quarterly, vol.3, issue.1-2, 1956.
DOI : 10.1002/nav.3800030109

B. Gold, N. Morgan, and D. Ellis, Speech and Audio Signal Processing -Processing and Perception of Speech and Music, 2011.

Y. Guo and D. Schuurmans, Convex Relaxations of Latent Variable Training, p.NIPS, 2007.

Z. Harchaoui, Conditional gradient algorithms for machine learning, In: NIPS Workshop, 2012.

T. Hastie, R. Tibshirani, and J. Friedman, The elements of statistical learning: data mining, inference and prediction, 2009.

S. Hongeng and R. Nevatia, Large-scale event detection using semi-hidden markov models, p.ICCV, 2003.

L. Hubert and P. Arabie, Comparing partitions, Journal of Classification, vol.78, issue.1, 1985.
DOI : 10.1007/BF01908075

Y. A. Ivanov and A. F. Bobick, Recognition of visual activities and interactions by stochastic parsing, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.22, issue.8, 2000.
DOI : 10.1109/34.868686

P. Jaccard, THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1, New Phytologist, vol.11, issue.2, 1912.
DOI : 10.1111/j.1469-8137.1912.tb05611.x

M. Jaggi, Revisiting Frank-Wolfe: Projection-free sparse convex optimization, p.ICML, 2013.

A. Joulin, F. Bach, and J. Ponce, Discriminative Clustering for Image Cosegmentation, p.CVPR, 2010.

A. Joulin, F. Bach, and J. Ponce, Multi-class cosegmentation, 2012 IEEE Conference on Computer Vision and Pattern Recognition, p.CVPR, 2012.
DOI : 10.1109/CVPR.2012.6247719
URL : https://hal.archives-ouvertes.fr/hal-00717448

S. Khamis, V. I. Morariu, and L. S. Davis, Combining Per-frame and Per-track Cues for Multi-person Action Recognition, p.ECCV, 2012.
DOI : 10.1007/978-3-642-33718-5_9
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.250.4018

S. Kwak, B. Han, and J. H. Han, Scenario-based video event recognition by constraint flow, CVPR 2011, p.CVPR, 2011.
DOI : 10.1109/CVPR.2011.5995435
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.204.6190

I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, Learning realistic human actions from movies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, p.CVPR, 2008.
DOI : 10.1109/CVPR.2008.4587756
URL : https://hal.archives-ouvertes.fr/inria-00548659

B. Laxton, J. Lim, and D. J. Kriegman, Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video, 2007 IEEE Conference on Computer Vision and Pattern Recognition, p.CVPR, 2007.
DOI : 10.1109/CVPR.2007.383074

J. Liu, B. Kuipers, and S. Savarese, Recognizing human actions by attributes, CVPR 2011, p.CVPR, 2011.
DOI : 10.1109/CVPR.2011.5995353

M. H. Nguyen, Z. Z. Lan, and F. D. La-torre, Joint segmentation and classification of human actions in video, p.CVPR, 2011.

J. C. Niebles, C. W. Chen, and F. F. Li, Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification, p.ECCV, 2010.
DOI : 10.1007/978-3-642-15552-9_29

L. R. Rabiner and B. H. Juang, Fundamentals of speech recognition, 1993.

M. Rohrbach, M. Regneri, M. Andriluka, S. Amin, M. Pinkal et al., Script Data for Attribute-Based Recognition of Composite Activities, p.ECCV, 2012.
DOI : 10.1007/978-3-642-33718-5_11

M. S. Ryoo and J. K. Aggarwal, Recognition of Composite Human Activities through Context-Free Grammar Based Representation, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), p.CVPR, 2006.
DOI : 10.1109/CVPR.2006.242

S. Sadanand and J. J. Corso, Action bank: A high-level representation of activity in video, 2012 IEEE Conference on Computer Vision and Pattern Recognition, p.CVPR, 2012.
DOI : 10.1109/CVPR.2012.6247806

J. Shi and J. Malik, Normalized Cuts and Image Segmentation, p.CVPR, 1997.

J. Sivic, M. Everingham, and A. Zisserman, Who are you? " -Learning person specific classifiers from video, p.CVPR, 2009.
DOI : 10.1109/cvpr.2009.5206513

K. Tang, L. Fei-fei, and D. Koller, Learning latent temporal structure for complex event detection, 2012 IEEE Conference on Computer Vision and Pattern Recognition, p.CVPR, 2012.
DOI : 10.1109/CVPR.2012.6247808

V. T. Vu, F. Bremond, and M. Thonnat, Automatic video interpretation: A novel algorithm for temporal scenario recognition, p.IJCAI, 2003.

H. Wang, A. Kläser, C. Schmid, and C. L. Liu, Action recognition by dense trajectories, CVPR 2011, p.CVPR, 2011.
DOI : 10.1109/CVPR.2011.5995407
URL : https://hal.archives-ouvertes.fr/inria-00583818

H. Wang and C. Schmid, Action Recognition with Improved Trajectories, 2013 IEEE International Conference on Computer Vision, p.ICCV, 2013.
DOI : 10.1109/ICCV.2013.441
URL : https://hal.archives-ouvertes.fr/hal-00873267