A conditional random field approach for audio-visual people diarization

Abstract : We investigate the problem of audiovisual (AV) person di-arization in broadcast data. That is, automatically associate the faces and voices of people and determine when they appear or speak in the video. The contributions are twofolds. First, we formulate the problem within a novel CRF framework that simultaneously performs the AV association of voices and face clusters to build AV person models, and the joint segmentation of the audio and visual streams using a set of AV cues and their association strength. Secondly, we use for this AV association strength a score that does not only rely on lips activity, but also on contextual visual information (face size, position, number of detected faces,.. .) that leads to more reliable association measures. Experiments on 6 hours of broadcast data show that our framework is able to improve the AV-person diarization especially for speaker segments erroneously labeled in the mono-modal case.
Type de document :
Communication dans un congrès
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014), 2014, Florence, Italy. pp.116 - 120, 2014, Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. 〈10.1109/ICASSP.2014.6853569〉
Liste complète des métadonnées

Littérature citée [21 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01433223
Contributeur : Sylvain Meignier <>
Soumis le : samedi 1 avril 2017 - 00:44:41
Dernière modification le : jeudi 7 février 2019 - 17:55:48
Document(s) archivé(s) le : dimanche 2 juillet 2017 - 12:20:16

Fichier

GayKhouryMeignierOdobezDelegli...
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Paul Gay, Elie Khoury, Sylvain Meignier, Jean-Marc Odobez, Paul Deléglise. A conditional random field approach for audio-visual people diarization. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014), 2014, Florence, Italy. pp.116 - 120, 2014, Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. 〈10.1109/ICASSP.2014.6853569〉. 〈hal-01433223〉

Partager

Métriques

Consultations de la notice

104

Téléchargements de fichiers

63