A conditional random field approach for audio-visual people diarization

Paul Gay; Elie Khoury; Sylvain Meignier; Jean-Marc Odobez; Paul Deléglise

doi:10.1109/ICASSP.2014.6853569

Communication Dans Un Congrès Année : 2014

A conditional random field approach for audio-visual people diarization

(1, 2) , (2) , (1) , (2) , (1)

1
2

Paul Gay

Fonction : Auteur

Laboratoire d'Informatique de l'Université du Mans

IDIAP Research Institute

Elie Khoury

Fonction : Auteur

IDIAP Research Institute

Sylvain Meignier

Fonction : Auteur

Laboratoire d'Informatique de l'Université du Mans

Jean-Marc Odobez

Fonction : Auteur

IDIAP Research Institute

Paul Deléglise

Fonction : Auteur
PersonId : 998324

Laboratoire d'Informatique de l'Université du Mans

Résumé

We investigate the problem of audiovisual (AV) person di-arization in broadcast data. That is, automatically associate the faces and voices of people and determine when they appear or speak in the video. The contributions are twofolds. First, we formulate the problem within a novel CRF framework that simultaneously performs the AV association of voices and face clusters to build AV person models, and the joint segmentation of the audio and visual streams using a set of AV cues and their association strength. Secondly, we use for this AV association strength a score that does not only rely on lips activity, but also on contextual visual information (face size, position, number of detected faces,.. .) that leads to more reliable association measures. Experiments on 6 hours of broadcast data show that our framework is able to improve the AV-person diarization especially for speaker segments erroneously labeled in the mono-modal case.

Mots clés

Audiovisual diarization Conditional Random Field

Domaines

Informatique et langage [cs.CL]

Fichier principal

GayKhouryMeignierOdobezDeleglise-CRF-AV-Diarization-ICASSP-2014.pdf (2.26 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

sylvain meignier : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01433223

Soumis le : samedi 1 avril 2017-00:44:41

Dernière modification le : mardi 8 décembre 2020-09:44:14

Archivage à long terme le : dimanche 2 juillet 2017-12:20:16

Dates et versions

hal-01433223 , version 1 (01-04-2017)

Identifiants

HAL Id : hal-01433223 , version 1
DOI : 10.1109/ICASSP.2014.6853569

Citer

Paul Gay, Elie Khoury, Sylvain Meignier, Jean-Marc Odobez, Paul Deléglise. A conditional random field approach for audio-visual people diarization. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014), 2014, Florence, Italy. pp.116 - 120, ⟨10.1109/ICASSP.2014.6853569⟩. ⟨hal-01433223⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LEMANS LIUM LIUM-LST ANR

123 Consultations

64 Téléchargements

A conditional random field approach for audio-visual people diarization

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager