Audiovisual data fusion for successive speakers tracking

Quentin Labourey; Olivier Aycard; Denis Pellerin; Michèle Rombaut

Communication Dans Un Congrès Année : 2014

Audiovisual data fusion for successive speakers tracking

(1, 2) , (3) , (4) , (4)

1
2
3
4

Quentin Labourey

Fonction : Auteur
PersonId : 785673
IdRef : 223390852

Laboratoire d'Informatique de Grenoble

Grenoble Images Parole Signal Automatique

Olivier Aycard

Fonction : Auteur
PersonId : 770572
IdRef : 153791713

Analyse de données, Modélisation et Apprentissage automatique [Grenoble]

Denis Pellerin

Fonction : Auteur
PersonId : 20254
IdHAL : denis-pellerin
ORCID : 0000-0002-3792-1706
IdRef : 060908998

GIPSA - Architecture, Géométrie, Perception, Images, Gestes

Michèle Rombaut

Fonction : Auteur
PersonId : 20146
IdHAL : michele-rombaut
ORCID : 0000-0003-4633-8501
IdRef : 10450059X

GIPSA - Architecture, Géométrie, Perception, Images, Gestes

Résumé

In this paper, a human speaker tracking method on audio and video data is presented. It is applied to con- versation tracking with a robot. Audiovisual data fusion is performed in a two-steps process. Detection is performed independently on each modality: face detection based on skin color on video data and sound source localization based on the time delay of arrival on audio data. The results of those detection processes are then fused thanks to an adaptation of bayesian filter to detect the speaker. The robot is able to detect the face of the talking person and to detect a new speaker in a conversation.

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV]

Fichier principal

LABOUREY_VISAPP_2014.pdf (888.89 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Quentin Labourey : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00935636

Soumis le : vendredi 31 janvier 2014-15:59:26

Dernière modification le : jeudi 4 avril 2024-21:30:53

Archivage à long terme le : samedi 8 avril 2017-22:15:16

Dates et versions

hal-00935636 , version 1 (31-01-2014)

Identifiants

HAL Id : hal-00935636 , version 1

Citer

Quentin Labourey, Olivier Aycard, Denis Pellerin, Michèle Rombaut. Audiovisual data fusion for successive speakers tracking. VISIGRAPP 2014 - 9th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Jan 2014, Lisbonne, Portugal. ⟨hal-00935636⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS GIPSA GIPSA-DIS LIG GIPSA-AGPIG PERSYVAL-LAB POLYTECH-GRENOBLE ANR LIG_SIDCH LIG_SIDCH_APTIKAL

514 Consultations

265 Téléchargements

Audiovisual data fusion for successive speakers tracking

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager