Deep complementary features for speaker identification in TV broadcast data

Abstract : This work tries to investigate the use of a Convolutional Neu-ral Network approach and its fusion with more traditional systems such as Total Variability Space for speaker identification in TV broadcast data. The former uses spectrograms for training, while the latter is based on MFCC features. The dataset poses several challenges such as significant class imbalance or background noise and music. Even though the performance of the Convolutional Neural Network is lower than the state-of-the-art, it is able to complement it and give better results through fusion. Different fusion techniques are evaluated using both early and late fusion.
Type de document :
Communication dans un congrès
Odyssey Workshop 2016, Jun 2016, Bilbao, Spain. Odyssey 2016, 〈10.21437/Odyssey.2016-21〉
Liste complète des métadonnées

Littérature citée [24 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01350068
Contributeur : Laurent Besacier <>
Soumis le : vendredi 29 juillet 2016 - 15:30:20
Dernière modification le : jeudi 11 janvier 2018 - 16:30:43

Fichier

odyssey-deep-complementary.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Mateusz Budnik, Laurent Besacier, Ali Khodabakhsh, Cenk Demiroglu. Deep complementary features for speaker identification in TV broadcast data. Odyssey Workshop 2016, Jun 2016, Bilbao, Spain. Odyssey 2016, 〈10.21437/Odyssey.2016-21〉. 〈hal-01350068〉

Partager

Métriques

Consultations de la notice

206

Téléchargements de fichiers

282