Audiovisual speaker diarization of TV series

Xavier Bost; Georges Linarès; Serigne Gueye

doi:10.1109/ICASSP.2015.7178882

Communication Dans Un Congrès Année : 2015

Audiovisual speaker diarization of TV series

(1) , (1) , (1)

Xavier Bost

Fonction : Auteur
PersonId : 170846
IdHAL : xavier-bost
ORCID : 0000-0002-5624-8721
IdRef : 201681404

Laboratoire Informatique d'Avignon

Georges Linarès

Fonction : Auteur
PersonId : 4977
IdHAL : georges-linares
IdRef : 079368794

Laboratoire Informatique d'Avignon

Serigne Gueye

Fonction : Auteur
PersonId : 176897
IdHAL : serigne-gueye
ORCID : 0000-0001-7217-8543
IdRef : 072392843

Laboratoire Informatique d'Avignon

Résumé

Speaker diarization may be difficult to achieve when applied to narrative films, where speakers usually talk in adverse acoustic conditions: background music, sound effects, wide variations in intonation may hide the inter-speaker variability and make audio-based speaker diarization approaches error prone. On the other hand, such fictional movies exhibit strong regularities at the image level, particularly within dialogue scenes. In this paper, we propose to perform speaker diarization within dialogue scenes of TV series by combining the audio and video modalities: speaker diarization is first performed by using each modality; the two resulting partitions of the instance set are then optimally matched, before the remaining instances, corresponding to cases of disagreement between both modalities, are finally processed. The results obtained by applying such a multi-modal approach to fictional films turn out to outperform those obtained by relying on a single modality.

Mots clés

Speaker diarization Multi-modal fusion Video structuration

Domaines

Multimédia [cs.MM] Informatique et langage [cs.CL]

Fichier principal

audiovisual_sd.pdf (143.99 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Xavier Bost : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01313080

Soumis le : dimanche 23 décembre 2018-13:53:47

Dernière modification le : jeudi 6 février 2020-15:46:05

Archivage à long terme le : dimanche 24 mars 2019-13:07:11

Dates et versions

hal-01313080 , version 1 (17-12-2018)

hal-01313080 , version 2 (23-12-2018)

Identifiants

HAL Id : hal-01313080 , version 2
ARXIV : 1812.07205
DOI : 10.1109/ICASSP.2015.7178882

Citer

Xavier Bost, Georges Linarès, Serigne Gueye. Audiovisual speaker diarization of TV series. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2015, Brisbane, Australia. pp.4799-4803, ⟨10.1109/ICASSP.2015.7178882⟩. ⟨hal-01313080v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-AVIGNON LIA

73 Consultations

167 Téléchargements

Audiovisual speaker diarization of TV series

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager