Skip to Main content Skip to Navigation
Conference papers

Audio-Visual Speaker Localization via Weighted Clustering

Israel-Dejene Gebru 1, * Xavier Alameda-Pineda 1 Radu Horaud 1 Florence Forbes 2
* Corresponding author
1 PERCEPTION - Interpretation and Modelling of Images and Videos
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
2 MISTIS - Modelling and Inference of Complex and Structured Stochastic Systems
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
Abstract : In this paper we address the problem of detecting and locating speakers using audiovisual data. We address this problem in the framework of clustering. We propose a novel weighted clustering method based on a finite mixture model which explores the idea of non-uniform weighting of observations. Weighted-data clustering techniques have already been proposed, but not in a generative setting as presented here. We introduce a weighted-data mixture model and we formally devise the associated EM procedure. The clustering algorithm is applied to the problem of detecting and localizing a speaker over time using both visual and auditory observations gathered with a single camera and two microphones. Audiovisual fusion is enforced by introducing a cross-modal weighting scheme. We test the robustness of the method with experiments in two challenging scenarios: disambiguate between an active and a non-active speaker, and associate a speech signal with a person.
Complete list of metadatas

Cited literature [16 references]  Display  Hide  Download


https://hal.archives-ouvertes.fr/hal-01053732
Contributor : Team Perception <>
Submitted on : Monday, August 11, 2014 - 4:26:58 PM
Last modification on : Thursday, March 26, 2020 - 8:49:35 PM
Document(s) archivé(s) le : Tuesday, November 25, 2014 - 10:51:05 PM

Files

mainCameraReady-HAL.pdf
Files produced by the author(s)

Identifiers

Citation

Israel-Dejene Gebru, Xavier Alameda-Pineda, Radu Horaud, Florence Forbes. Audio-Visual Speaker Localization via Weighted Clustering. IEEE Workshop on Machine Learning for Signal Processing, Sep 2014, Reims, France. pp.1-6, ⟨10.1109/MLSP.2014.6958874⟩. ⟨hal-01053732⟩

Share

Metrics

Record views

1150

Files downloads

945