Imagenet: A largescale hierarchical image database, In: Computer Vision and Pattern Recognition IEEE, pp.248-255, 2009. ,
Learning deep features for scene recognition using places database, Neural Information Processing Systems (NIPS), pp.487-495, 2014. ,
ActivityNet: A large-scale video benchmark for human activity understanding, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.961-970, 2015. ,
DOI : 10.1109/CVPR.2015.7298698
URL : http://repository.kaust.edu.sa/kaust/bitstream/10754/556141/1/ActivityNet_CVPR2015.pdf
Recognizing realistic actions from videos in the wild ,
THUMOS challenge: Action recognition with a large number of classes, 2015. ,
Largescale video classification with convolutional neural networks, In: Computer Vision and Pattern Recognition (CVPR) IEEE, vol.4, pp.1725-1732, 2014. ,
DOI : 10.1109/cvpr.2014.223
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.471.3312
HMDB: A large video database for human motion recognition, 2011 International Conference on Computer Vision, pp.2556-2563, 2011. ,
DOI : 10.1109/ICCV.2011.6126543
UCF101: A dataset of 101 human actions classes from videos in the wild, 2012. ,
Learning realistic human actions from movies, 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2008. ,
DOI : 10.1109/CVPR.2008.4587756
URL : https://hal.archives-ouvertes.fr/inria-00548659
Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition, 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2008. ,
DOI : 10.1109/CVPR.2008.4587727
A dataset for Movie Description, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. ,
DOI : 10.1109/CVPR.2015.7298940
Recognizing human actions: a local SVM approach, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., pp.32-36, 2004. ,
DOI : 10.1109/ICPR.2004.1334462
Actions as Space-Time Shapes, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.29, issue.12, pp.2247-2253, 2007. ,
DOI : 10.1109/TPAMI.2007.70711
A database for fine grained activity detection of cooking activities, 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.1194-1201, 2012. ,
DOI : 10.1109/CVPR.2012.6247801
A large-scale benchmark dataset for event recognition in surveillance video, CVPR 2011, pp.3153-3160, 2011. ,
DOI : 10.1109/CVPR.2011.5995586
The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014. ,
DOI : 10.1109/CVPR.2014.105
Coherent Multi-sentence Video Description with Variable Level of Detail, In: Pattern Recognition, issue.4, pp.184-195, 2014. ,
DOI : 10.1007/978-3-319-11752-2_15
Actions in context, In: Computer Vision and Pattern Recognition IEEE, vol.3, 2009. ,
2D Human Pose Estimation in TV Shows, pp.128-147, 2009. ,
DOI : 10.1007/978-3-642-03061-1_7
Collecting highly parallel data for paraphrase evaluation, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp.190-200, 2011. ,
Using descriptive video services to create a large data source for video annotation research ,
Objects in Action: An Approach for Combining Action Understanding and Object Perception, 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2007. ,
DOI : 10.1109/CVPR.2007.383331
Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities, 2009 IEEE 12th International Conference on Computer Vision, pp.1593-1600, 2009. ,
DOI : 10.1109/ICCV.2009.5459361
PhotoCity, Proceedings of the 2011 annual conference on Human factors in computing systems, CHI '11, pp.1383-1392, 2011. ,
DOI : 10.1145/1978942.1979146
Detecting activities of daily living in first-person camera views, 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.2847-2854, 2012. ,
DOI : 10.1109/CVPR.2012.6248010
First-Person Animal Activity Recognition from Egocentric Videos, 2014 22nd International Conference on Pattern Recognition, 2014. ,
DOI : 10.1109/ICPR.2014.739
Bringing Semantics into Focus Using Visual Abstraction, 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp.3009-3016, 2013. ,
DOI : 10.1109/CVPR.2013.387
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.306.7749
Introduction to modern information retrieval, pp.24-51, 1983. ,
Much ado about time: Exhaustive annotation of temporal data. arXiv preprint arXiv:1607, p.7429, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01431527
The psycho-biology of language, p.7, 1935. ,
Visualizing data using t-sne, Journal of Machine Learning Research, vol.9, issue.85, pp.2579-2605, 2008. ,
Action Recognition with Improved Trajectories, 2013 IEEE International Conference on Computer Vision, p.11 ,
DOI : 10.1109/ICCV.2013.441
URL : https://hal.archives-ouvertes.fr/hal-00873267
Improving the Fisher Kernel for Large-Scale Image Classification, European Conference on Computer Vision (ECCV), p.10, 2010. ,
DOI : 10.1007/978-3-642-15561-1_11
URL : https://hal.archives-ouvertes.fr/inria-00548630
Very deep convolutional networks for large-scale image recognition, International Conference on Learning Representations (ICLR), p.10, 2015. ,
Very deep convolutional networks for large-scale image recognition, pp.1556-1566, 2014. ,
Two-stream convolutional networks for action recognition in videos, Neural Information Processing Systems (NIPS), p.10, 2014. ,
Learning Spatiotemporal Features with 3D Convolutional Networks, 2015 IEEE International Conference on Computer Vision (ICCV), p.11, 2015. ,
DOI : 10.1109/ICCV.2015.510
URL : http://arxiv.org/abs/1412.0767
Microsoft coco captions: Data collection and evaluation server, p.13, 2015. ,
Exploring nearest neighbor approaches for image captioning. arXiv preprint arXiv, pp.1505-04467, 2015. ,
Sequence to Sequence -- Video to Text, 2015 IEEE International Conference on Computer Vision (ICCV), pp.4534-4542, 2015. ,
DOI : 10.1109/ICCV.2015.515