OCR-aided person annotation and label propagation for speaker modeling in TV shows

Abstract : In this paper, we present an approach for minimizing human effort in manual speaker annotation. Label propagation is used at each iteration of an active learning cycle. More precisely, a selection strategy for choosing the most suitable speech track to be labeled is proposed. Four different selection strategies are evaluated and all the tracks in a corresponding cluster are gathered using agglomerative clustering in order to propagate human annotations. To further reduce the manual labor required, an optical character recognition system is used to bootstrap annotations. At each step of the cycle , annotations are used to build speaker models. The quality of the generated speaker models is evaluated at each step using an i-vector based speaker identification system. The presented approach shows promising results on the REPERE corpus with a minimum amount of human effort for annotation .
Type de document :
Communication dans un congrès
IEEE ICASSP 2016, Mar 2016, Shangai, China. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2016, 〈10.1109/ICASSP.2016.7472743〉
Liste complète des métadonnées

Littérature citée [17 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01350071
Contributeur : Laurent Besacier <>
Soumis le : vendredi 29 juillet 2016 - 15:34:47
Dernière modification le : samedi 15 décembre 2018 - 01:49:51

Fichier

ocr-aided-person-4.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Mateusz Budnik, Laurent Besacier, Ali Khodabakhsh, Cenk Demiroglu. OCR-aided person annotation and label propagation for speaker modeling in TV shows. IEEE ICASSP 2016, Mar 2016, Shangai, China. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2016, 〈10.1109/ICASSP.2016.7472743〉. 〈hal-01350071〉

Partager

Métriques

Consultations de la notice

166

Téléchargements de fichiers

56