Speaker Change Detection in Broadcast TV Using Bidirectional Long Short-Term Memory Networks

Abstract : Speaker change detection is an important step in a speaker di-arization system. It aims at finding speaker change points in the audio stream. In this paper, it is treated as a sequence labeling task and addressed by Bidirectional long short term memory networks (Bi-LSTM). The system is trained and evaluated on the Broadcast TV subset from ETAPE database. The result shows that the proposed model brings good improvement over conventional methods based on BIC and Gaussian Divergence. For instance, in comparison to Gaussian divergence, it produces speech turns that are 19.5% longer on average, with the same level of purity.
Document type :
Conference papers
Complete list of metadatas

Cited literature [25 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01690244
Contributor : Claude Barras <>
Submitted on : Tuesday, January 23, 2018 - 5:03:08 PM
Last modification on : Thursday, June 20, 2019 - 4:34:11 PM
Long-term archiving on : Thursday, May 24, 2018 - 9:49:56 AM

File

0065.PDF
Publisher files allowed on an open archive

Identifiers

Citation

Ruiqing Yin, Hervé Bredin, Claude Barras. Speaker Change Detection in Broadcast TV Using Bidirectional Long Short-Term Memory Networks. Interspeech 2017, Aug 2017, Stockholm, Sweden. ⟨10.21437/Interspeech.2017-65⟩. ⟨hal-01690244⟩

Share

Metrics

Record views

431

Files downloads

904