Multimodal Deep Neural Networks for Pose Estimation and Action Recognition

Abstract : In this work, we present a unified multimodal neural network for pose estimation from RGB images and action recognition from video sequences. We show that a multimodal approach benefits 3D pose estimation by mixing high precision 3D data and “in the wild” 2D annotated images, while action recognition also benefits from better visual features. Furthermore, we demonstrate by our experiments that end-to-end optimization results in better performance for action recognition than separated learning. We reported state-of-the-art results on 3D pose estimation and action recognition respectively on Human3.6M and NTU RGB+D datasets.
Document type :
Conference papers
Complete list of metadatas

Cited literature [39 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01815707
Contributor : Diogo Luvizon <>
Submitted on : Thursday, June 14, 2018 - 2:07:15 PM
Last modification on : Friday, July 5, 2019 - 3:26:03 PM
Long-term archiving on : Saturday, September 15, 2018 - 2:06:12 PM

File

friap18.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01815707, version 1

Citation

Diogo Luvizon, Hedi Tabia, David Picard. Multimodal Deep Neural Networks for Pose Estimation and Action Recognition. Congrès Reconnaissance des Formes, Image, Apprentissage et Perception (RFIAP 2018), Jun 2018, Marne-la-Vallée, France. ⟨hal-01815707⟩

Share

Metrics

Record views

441

Files downloads

582