Skip to Main content Skip to Navigation
Conference papers

Action and Event Recognition with Fisher Vectors on a Compact Feature Set

Dan Oneata 1 Jakob Verbeek 1 Cordelia Schmid 1 
1 LEAR - Learning and recognition in vision
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, Grenoble INP - Institut polytechnique de Grenoble - Grenoble Institute of Technology
Abstract : Action recognition in uncontrolled video is an important and challenging computer vision problem. Recent progress in this area is due to new local features and models that capture spatio-temporal structure between local features, or human-object interactions. Instead of working towards more complex models, we focus on the low-level features and their encoding. We evaluate the use of Fisher vectors as an alternative to bag-of-word histograms to aggregate a small set of state-of-the-art low-level descriptors, in combination with linear classifiers. We present a large and varied set of evaluations, considering (i) classification of short actions in five datasets, (ii) localization of such actions in feature-length movies, and (iii) large-scale recognition of complex events. We find that for basic action recognition and localization MBH features alone are enough for state-of-the-art performance. For complex events we find that SIFT and MFCC features provide complementary cues. On all three problems we obtain state-of-the-art results, while using fewer features and less complex models.
Document type :
Conference papers
Complete list of metadata

Cited literature [44 references]  Display  Hide  Download
Contributor : THOTH Team Connect in order to contact the contributor
Submitted on : Wednesday, February 19, 2014 - 4:34:17 PM
Last modification on : Thursday, January 20, 2022 - 5:28:02 PM
Long-term archiving on: : Sunday, April 9, 2017 - 2:18:57 PM


Files produced by the author(s)




Dan Oneata, Jakob Verbeek, Cordelia Schmid. Action and Event Recognition with Fisher Vectors on a Compact Feature Set. ICCV - IEEE International Conference on Computer Vision, Dec 2013, Sydney, Australia. pp.1817-1824, ⟨10.1109/ICCV.2013.228⟩. ⟨hal-00873662v2⟩



Record views


Files downloads