Audiovisual data fusion for successive speakers tracking

Abstract : In this paper, a human speaker tracking method on audio and video data is presented. It is applied to con- versation tracking with a robot. Audiovisual data fusion is performed in a two-steps process. Detection is performed independently on each modality: face detection based on skin color on video data and sound source localization based on the time delay of arrival on audio data. The results of those detection processes are then fused thanks to an adaptation of bayesian filter to detect the speaker. The robot is able to detect the face of the talking person and to detect a new speaker in a conversation.
Type de document :
Communication dans un congrès
VISIGRAPP - 9th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP 2014), Jan 2014, Lisbonne, Portugal. 2014


https://hal.archives-ouvertes.fr/hal-00935636
Contributeur : Quentin Labourey <>
Soumis le : vendredi 31 janvier 2014 - 15:59:26
Dernière modification le : mercredi 17 juin 2015 - 01:17:04

Fichier

LABOUREY_VISAPP_2014.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00935636, version 1

Collections

Citation

Quentin Labourey, Olivier Aycard, Denis Pellerin, Michèle Rombaut. Audiovisual data fusion for successive speakers tracking. VISIGRAPP - 9th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP 2014), Jan 2014, Lisbonne, Portugal. 2014. <hal-00935636>

Exporter

Partager

Métriques

Consultations de
la notice

355

Téléchargements du document

171