Structured Output Prediction and Learning for Deep Monocular 3D Human Pose Estimation

Stefan Kinauer; Riza Alp Güler; Siddhartha Chandra; Iasonas Kokkinos

Communication Dans Un Congrès Année : 2017

Structured Output Prediction and Learning for Deep Monocular 3D Human Pose Estimation

(1, 2) , (2) , (2) , (2)

1
2

Stefan Kinauer

Fonction : Auteur

Centre de vision numérique

Organ Modeling through Extraction, Representation and Understanding of Medical Image Content

Riza Alp Güler

Fonction : Auteur

Organ Modeling through Extraction, Representation and Understanding of Medical Image Content

Siddhartha Chandra

Fonction : Auteur
PersonId : 975543

Organ Modeling through Extraction, Representation and Understanding of Medical Image Content

Iasonas Kokkinos

Fonction : Auteur
PersonId : 865671

Organ Modeling through Extraction, Representation and Understanding of Medical Image Content

Résumé

In this work we address the problem of estimating 3D human pose from a single RGB image by blending a feed-forward CNN with a graphical model that couples the 3D positions of parts. The CNN populates a volumetric output space that represents the possible positions of 3D human joints, and also regresses the estimated displacements between pairs of parts. These constitute the 'unary' and 'pairwise' terms of the energy of a graphical model that resides in a 3D label space and delivers an optimal 3D pose configuration at its output. The CNN is trained on the 3D human pose dataset 3.6M, the graphical model is trained jointly with the CNN in an end-to-end manner, allowing us to exploit both the discriminative power of CNNs and the top-down information pertaining to human pose. We introduce (a) memory efficient methods for getting accurate voxel estimates for parts by blending quantization with regression (b) employ efficient structured prediction algorithms for 3D pose estimation using branch-and-bound and (c) develop a framework for qualitative and quantitative comparison of competing graphical models. We evaluate our work on the Human 3.6M dataset, demonstrating that exploiting the structure of the human pose in 3D yields systematic gains.

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV] Apprentissage [cs.LG]

Fichier principal

camera ready (1).pdf (5.06 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Stefan Kinauer : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01672592

Soumis le : mardi 26 décembre 2017-12:43:34

Dernière modification le : mercredi 15 mars 2023-08:56:16

Archivage à long terme le : mardi 27 mars 2018-12:12:03

Dates et versions

hal-01672592 , version 1 (26-12-2017)

Identifiants

HAL Id : hal-01672592 , version 1

Citer

Stefan Kinauer, Riza Alp Güler, Siddhartha Chandra, Iasonas Kokkinos. Structured Output Prediction and Learning for Deep Monocular 3D Human Pose Estimation. EMMCVPR 2017 - 11th International Conference on Energy Minimization Methods in Computer Vision and Pattern Recognition, Oct 2017, Venise, Italy. pp.1-14. ⟨hal-01672592⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INRIA CVN CENTRALESUPELEC INRIA2 UNIV-PARIS-SACLAY GS-ENGINEERING GS-COMPUTER-SCIENCE

264 Consultations

393 Téléchargements

Structured Output Prediction and Learning for Deep Monocular 3D Human Pose Estimation

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager