DIFFRAC: A discriminative and flexible framework for clustering, NIPS, p.7, 2007. ,
Finding Actors and Actions in Movies, 2013 IEEE International Conference on Computer Vision, 2013. ,
DOI : 10.1109/ICCV.2013.283
URL : https://hal.archives-ouvertes.fr/hal-00904991
Weakly Supervised Action Labeling in Videos under Ordering Constraints, ECCV, 2006. ,
DOI : 10.1007/978-3-319-10602-1_41
URL : https://hal.archives-ouvertes.fr/hal-01053967
Weakly-Supervised Alignment of Video with Text, 2015 IEEE International Conference on Computer Vision (ICCV), 2006. ,
DOI : 10.1109/ICCV.2015.507
URL : https://hal.archives-ouvertes.fr/hal-01154523
Unsupervised learning of narrative event chains, ACL, 2008. ,
On pairwise costs for network flow multi-object tracking, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. ,
DOI : 10.1109/CVPR.2015.7299193
URL : http://arxiv.org/abs/1408.3304
Deep filter banks for texture recognition and segmentation, CVPR, 2015. ,
DOI : 10.1109/cvpr.2015.7299007
URL : https://hal.archives-ouvertes.fr/hal-01263622
Generating typed dependency parses from phrase structure parses, LREC, p.9, 2006. ,
Automatic annotation of human actions in video, 2009 IEEE 12th International Conference on Computer Vision, 2009. ,
DOI : 10.1109/ICCV.2009.5459279
Wordnet: An electronic lexical database, 1998. ,
A Hierarchical Bayesian Model for Unsupervised Induction of Script Knowledge, Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 2014. ,
DOI : 10.3115/v1/E14-1006
CLUSTAL: a package for performing multiple sequence alignment on a microcomputer, Gene, vol.73, issue.1, p.9, 1988. ,
DOI : 10.1016/0378-1119(88)90330-7
Random Design Analysis of Ridge Regression, Foundations of Computational Mathematics, vol.17, issue.36, p.2014 ,
DOI : 10.1162/0899766054323008
Revisiting Frank-Wolfe: Projection-free sparse convex optimization, ICML, 2013. ,
Efficient image and video colocalization with Frank-Wolfe algorithm, ECCV, 2014. ,
DOI : 10.1007/978-3-319-10599-4_17
URL : http://ai.stanford.edu/%7Ekdtang/papers/eccv14-vidcoloc.pdf
Convergence rate of Frank-Wolfe for non-convex objectives. arXiv preprint, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01415335
On the global linear convergence of Frank-Wolfe optimization variants, NIPS, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01248675
Blockcoordinate Frank-Wolfe optimization for structural SVMs, Proceedings of the International Conference on Machine Learning (ICML), 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00720158
Learning realistic human actions from movies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008. ,
DOI : 10.1109/CVPR.2008.4587756
URL : https://hal.archives-ouvertes.fr/inria-00548659
Multiple sequence alignment using partial order graphs, Bioinformatics, vol.18, issue.3, 2002. ,
DOI : 10.1093/bioinformatics/18.3.452
URL : https://academic.oup.com/bioinformatics/article-pdf/18/3/452/648375/180452.pdf
Clustering of time series data, a survey, Pattern recognition, issue.10, 2014. ,
What???s Cookin???? Interpreting Cooking Videos using Text, Speech and Vision, Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies ,
DOI : 10.3115/v1/N15-1015
URL : http://arxiv.org/abs/1503.01558
Learning from video and text via large-scale discriminative clustering, ICCV, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01569540
Distributed representations of words and phrases and their compositionality, NIPS, 2013. ,
Wordnet: A lexical database for english, Communications of the ACM, issue.5, 1995. ,
DOI : 10.1145/219717.219748
URL : http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.1823&rep=rep1&type=pdf
Discriminative Unsupervised Alignment of Natural Language Instructions with Corresponding Video Segments, Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2015. ,
DOI : 10.3115/v1/N15-1017
Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification, ECCV, 2010. ,
DOI : 10.1007/978-3-642-15552-9_29
Unsupervised learning of human action categories using spatial-temporal words. IJCV, 2008. ,
Minding the gaps for block Frank-Wolfe optimization of structured SVMs, Proceedings of The 33rd International Conference of Machine Learning (ICML), 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01323727
Category-Specific Video Summarization, ECCV, 2014. ,
DOI : 10.1007/978-3-319-10599-4_35
URL : https://hal.archives-ouvertes.fr/hal-01022967
Poselet Key-Framing: A Model for Human Activity Recognition, 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013. ,
DOI : 10.1109/CVPR.2013.342
Learning script knowledge with Web experiments, ACL, 2004. ,
Unsupervised Semantic Parsing of Video Collections, 2015 IEEE International Conference on Computer Vision (ICCV), 2015. ,
DOI : 10.1109/ICCV.2015.509
URL : http://arxiv.org/abs/1506.08438
Very deep convolutional networks for large-scale image recognition, ICLR, 2015. ,
Ranking Domain-Specific Highlights by Analyzing Edited Videos, ECCV, 2014. ,
DOI : 10.1007/978-3-319-10590-1_51
Action Recognition with Improved Trajectories, 2013 IEEE International Conference on Computer Vision, 2013. ,
DOI : 10.1109/ICCV.2013.441
URL : https://hal.archives-ouvertes.fr/hal-00873267
On the Complexity of Multiple Sequence Alignment, Journal of Computational Biology, vol.1, issue.4, pp.337-348, 1994. ,
DOI : 10.1089/cmb.1994.1.337
Alayrac received the MS degree in computer science in Ecole Normale SupérieureSup´Supérieure (ENS), in Paris in 2014. He is currently working toward the PhD degree in the research teams WILLOW and SIERRA at INRIA Paris under the supervision of Josef Sivic, Ivan Laptev and Simon Lacoste-Julien, His research focuses on structured prediction from vision and natural language ,
Bojanowski is a Post Doctoral Researcher at Facebook AI Research He graduated from a Ph.D. at Willow team at INRIA Paris in 2016 where he was supervised by, His work focuses on automatic video and image understanding ,