Deep complementary features for speaker identification in TV broadcast data

Abstract : This work tries to investigate the use of a Convolutional Neu-ral Network approach and its fusion with more traditional systems such as Total Variability Space for speaker identification in TV broadcast data. The former uses spectrograms for training, while the latter is based on MFCC features. The dataset poses several challenges such as significant class imbalance or background noise and music. Even though the performance of the Convolutional Neural Network is lower than the state-of-the-art, it is able to complement it and give better results through fusion. Different fusion techniques are evaluated using both early and late fusion.
Document type :
Conference papers
Liste complète des métadonnées

Cited literature [24 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01350068
Contributor : Laurent Besacier <>
Submitted on : Friday, July 29, 2016 - 3:30:20 PM
Last modification on : Thursday, April 4, 2019 - 10:18:05 AM

File

odyssey-deep-complementary.pdf
Files produced by the author(s)

Identifiers

Citation

Mateusz Budnik, Laurent Besacier, Ali Khodabakhsh, Cenk Demiroglu. Deep complementary features for speaker identification in TV broadcast data. Odyssey Workshop 2016, Jun 2016, Bilbao, Spain. ⟨10.21437/Odyssey.2016-21⟩. ⟨hal-01350068⟩

Share

Metrics

Record views

258

Files downloads

355