Skip to Main content Skip to Navigation
Conference papers

Multimodal Deep Neural Networks for Pose Estimation and Action Recognition

Abstract : In this work, we present a unified multimodal neural network for pose estimation from RGB images and action recognition from video sequences. We show that a multimodal approach benefits 3D pose estimation by mixing high precision 3D data and “in the wild” 2D annotated images, while action recognition also benefits from better visual features. Furthermore, we demonstrate by our experiments that end-to-end optimization results in better performance for action recognition than separated learning. We reported state-of-the-art results on 3D pose estimation and action recognition respectively on Human3.6M and NTU RGB+D datasets.
Document type :
Conference papers
Complete list of metadatas

Cited literature [39 references]  Display  Hide  Download
Contributor : Diogo Luvizon <>
Submitted on : Thursday, June 14, 2018 - 2:07:15 PM
Last modification on : Thursday, March 5, 2020 - 4:25:50 PM
Long-term archiving on: : Saturday, September 15, 2018 - 2:06:12 PM


Files produced by the author(s)


  • HAL Id : hal-01815707, version 1


Diogo Luvizon, Hedi Tabia, David Picard. Multimodal Deep Neural Networks for Pose Estimation and Action Recognition. Congrès Reconnaissance des Formes, Image, Apprentissage et Perception (RFIAP 2018), Jun 2018, Marne-la-Vallée, France. ⟨hal-01815707⟩



Record views


Files downloads