Exploiting the Complementarity of Audio and Visual Data in Multi-Speaker Tracking

Yutong Ban 1 Laurent Girin 1, 2 Xavier Alameda-Pineda 1 Radu Horaud 1
1 PERCEPTION - Interpretation and Modelling of Images and Videos
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
2 GIPSA-MAGIC - MAGIC
GIPSA-DPC - Département Parole et Cognition
Abstract : Multi-speaker tracking is a central problem in human-robot interaction. In this context, exploiting auditory and visual information is gratifying and challenging at the same time. Gratifying because the complementary nature of auditory and visual information allows us to be more robust against noise and outliers than unimodal approaches. Challenging because how to properly fuse auditory and visual information for multi-speaker tracking is far from being a solved problem. In this paper we propose a probabilistic generative model that tracks multiple speakers by jointly exploiting auditory and visual features in their own representation spaces. Importantly, the method is robust to missing data and is therefore able to track even when observations from one of the modalities are absent. Quantitative and qualitative results on the AVDIAR dataset are reported.
Type de document :
Communication dans un congrès
ICCV Workshop on Computer Vision for Audio-Visual Media, Oct 2017, Venezia, Italy. 2017
Liste complète des métadonnées



https://hal.inria.fr/hal-01577965
Contributeur : Team Perception <>
Soumis le : lundi 28 août 2017 - 15:07:40
Dernière modification le : vendredi 8 septembre 2017 - 15:57:34

Fichiers

ICCVW_submission.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01577965, version 1

Citation

Yutong Ban, Laurent Girin, Xavier Alameda-Pineda, Radu Horaud. Exploiting the Complementarity of Audio and Visual Data in Multi-Speaker Tracking. ICCV Workshop on Computer Vision for Audio-Visual Media, Oct 2017, Venezia, Italy. 2017. <hal-01577965>

Partager

Métriques

Consultations de
la notice

153

Téléchargements du document

52