Improving Speaker Identification in TV-shows using person name detection in overlaid text and speech

Abstract : This paper is dedicated to the use of auxiliary information in order to help a classical acoustic-based speaker identification system in the specific context of TV shows. The underlying assumption is that auxiliary information could help (1) to re-rank n-best speaker hypotheses provided by the acoustic-based only speaker identification system, (2) to provide confidence score to refine a rejection process (open-set identification task), and finally, (3) to identify speakers not covered by the speaker dictionary (out-of-dictionary speakers) used by the speaker identification system (full-set verification task); the last point being one of the main issue when dealing with TV shows. In this paper, the auxiliary information is based on person names detected in overlaid text and speech. Experiments conducted in three different datasets issued from the REPERE evaluation campaign have highlighted the interest of the auxiliary information used here, and notably the use of overlaid person names to identify out-of-dictionary speakers, confirming the key assumptions made.
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01314607
Contributor : Bibliothèque Universitaire Déposants Hal-Avignon <>
Submitted on : Wednesday, May 11, 2016 - 4:17:14 PM
Last modification on : Friday, March 29, 2019 - 2:36:04 PM

Identifiers

  • HAL Id : hal-01314607, version 1

Collections

Citation

Delphine Charlet, Corinne Fredouille, Géraldine Damnati, Gregory Senay. Improving Speaker Identification in TV-shows using person name detection in overlaid text and speech. Interspeech, Aug 2013, Lyon, France. ⟨hal-01314607⟩

Share

Metrics

Record views

41