Mapping Sounds on Images Using Binaural Spectrograms

Antoine Deleforge 1, * Vincent Drouard 1 Laurent Girin 2 Radu Horaud 1
* Auteur correspondant
1 PERCEPTION - Interpretation and Modelling of Images and Videos
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
2 GIPSA-MAGIC - MAGIC
GIPSA-DPC - Département Parole et Cognition
Abstract : We propose a novel method for mapping sound spectrograms onto images and thus enabling alignment between auditory and visual features for subsequent multimodal processing. We suggest a supervised learning approach to this audio-visual fusion problem, on the following grounds. Firstly, we use a Gaussian mixture of locally-linear regressions to learn a mapping from image locations to binaural spectrograms. Secondly, we derive a closed-form expression for the conditional posterior probability of an image location, given both an observed spectrogram, emitted from an unknown source direction, and the mapping parameters that were previously learnt. Prominently, the proposed method is able to deal with completely different spectrograms for training and for alignment. While fixed-length wide-spectrum sounds are used for learning, thus fully and robustly estimating the regression, variable-length sparse-spectrum sounds, e.g., speech, are used for alignment. The proposed method successfully extracts the image location of speech utterances in realistic reverberant-room scenarios.
Type de document :
Communication dans un congrès
22nd European Signal Processing Conference (EUSIPCO-2014), Sep 2014, Lisbonne, Portugal. IEEE, pp.2470 - 2474, 2014, 〈http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6952934〉
Liste complète des métadonnées

Littérature citée [18 références]  Voir  Masquer  Télécharger


https://hal.archives-ouvertes.fr/hal-01019287
Contributeur : Team Perception <>
Soumis le : lundi 7 juillet 2014 - 10:45:42
Dernière modification le : mercredi 11 avril 2018 - 01:58:40
Document(s) archivé(s) le : mardi 7 octobre 2014 - 11:52:51

Fichiers

Deleforge-EUSIPCO-AV.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01019287, version 1

Citation

Antoine Deleforge, Vincent Drouard, Laurent Girin, Radu Horaud. Mapping Sounds on Images Using Binaural Spectrograms. 22nd European Signal Processing Conference (EUSIPCO-2014), Sep 2014, Lisbonne, Portugal. IEEE, pp.2470 - 2474, 2014, 〈http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6952934〉. 〈hal-01019287〉

Partager

Métriques

Consultations de la notice

1640

Téléchargements de fichiers

678