Lexical speaker identification in TV shows

Anindya Roy; Hervé Bredin; William Hartmann; Viet Bac Le; Claude Barras; Jean-Luc Gauvain

doi:10.1007/s11042-014-1940-3

Article Dans Une Revue Multimedia Tools and Applications Année : 2015

Lexical speaker identification in TV shows

(1) , (1) , (1) , (2) , (1) , (1)

1
2

Anindya Roy

Fonction : Auteur

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Hervé Bredin

Fonction : Auteur
PersonId : 15856
IdHAL : hbredin
ORCID : 0000-0002-3739-925X
IdRef : 121165779

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

William Hartmann

Fonction : Auteur

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Viet Bac Le

Fonction : Auteur

Vocapia Research [Orsay]

Claude Barras

Fonction : Auteur
PersonId : 17217
IdHAL : claude-barras
IdRef : 165065583

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Jean-Luc Gauvain

Fonction : Auteur

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Résumé

It is possible to use lexical information extracted from speech transcripts for speaker identification (SID), either on its own or to improve the performance of standard cepstral-based SID systems upon fusion. This was established before typically using isolated speech from single speakers (NIST SRE corpora, parliamentary speeches). On the contrary, this work applies lexical approaches for SID on a different type of data. It uses the REPERE corpus consisting of unsegmented multiparty conversations, mostly debates, discussions and Q&A sessions from TV shows. It is hypothesized that people give out clues to their identity when speaking in such settings which this work aims to exploit. The impact on SID performance of the diarization front-end required to pre-process the unsegmented data is also measured. Four lexical SID approaches are studied in this work, including TFIDF, BM25 and LDA-based topic modeling. Results are analysed in terms of TV shows and speaker roles. Lexical approaches achieve low error rates for certain speaker roles such as anchors and journalists, sometimes lower than a standard cepstral-based Gaussian Supervector-Support Vector Machine (GSV-SVM) system. Also, in certain cases, the lexical system shows modest improvement over the cepstral-based system performance using score-level sum fusion. To highlight the potential of using lexical information not just to improve upon cepstral-based SID systems but as an independent approach in its own right, initial studies on crossmedia SID is briefly reported. Instead of using 2 Anindya Roy et al. speech data as all cepstral systems require, this approach uses Wikipedia texts to train lexical speaker models which are then tested on speech transcripts to identify speakers.

Domaines

Informatique [cs] Multimédia [cs.MM]

Fichier principal

paper_v0.pdf (200.4 Ko)

paper_v0 (1).pdf (200.4 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Claude Barras : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01690342

Soumis le : lundi 22 janvier 2018-23:08:47

Dernière modification le : samedi 7 octobre 2023-21:36:20

Archivage à long terme le : jeudi 24 mai 2018-11:05:37

Dates et versions

hal-01690342 , version 1 (22-01-2018)

Identifiants

HAL Id : hal-01690342 , version 1
DOI : 10.1007/s11042-014-1940-3

Citer

Anindya Roy, Hervé Bredin, William Hartmann, Viet Bac Le, Claude Barras, et al.. Lexical speaker identification in TV shows. Multimedia Tools and Applications, 2015, 74 (4), pp.1377 - 1396. ⟨10.1007/s11042-014-1940-3⟩. ⟨hal-01690342⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS LIMSI UNIV-PARIS-SACLAY SORBONNE-UNIVERSITE LISN GS-ENGINEERING GS-COMPUTER-SCIENCE

69 Consultations

225 Téléchargements

Lexical speaker identification in TV shows

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager