A conditional random field approach for audio-visual people diarization - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2014

A conditional random field approach for audio-visual people diarization

Résumé

We investigate the problem of audiovisual (AV) person di-arization in broadcast data. That is, automatically associate the faces and voices of people and determine when they appear or speak in the video. The contributions are twofolds. First, we formulate the problem within a novel CRF framework that simultaneously performs the AV association of voices and face clusters to build AV person models, and the joint segmentation of the audio and visual streams using a set of AV cues and their association strength. Secondly, we use for this AV association strength a score that does not only rely on lips activity, but also on contextual visual information (face size, position, number of detected faces,.. .) that leads to more reliable association measures. Experiments on 6 hours of broadcast data show that our framework is able to improve the AV-person diarization especially for speaker segments erroneously labeled in the mono-modal case.
Fichier principal
Vignette du fichier
GayKhouryMeignierOdobezDeleglise-CRF-AV-Diarization-ICASSP-2014.pdf (2.26 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01433223 , version 1 (01-04-2017)

Identifiants

Citer

Paul Gay, Elie Khoury, Sylvain Meignier, Jean-Marc Odobez, Paul Deléglise. A conditional random field approach for audio-visual people diarization. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014), 2014, Florence, Italy. pp.116 - 120, ⟨10.1109/ICASSP.2014.6853569⟩. ⟨hal-01433223⟩
123 Consultations
64 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More