First investigations on self trained speaker diarization

Abstract : This paper investigates self trained cross-show speaker diarization applied to collections of French TV archives, based on an i-vector/PLDA framework. The parameters used for i-vectors extraction and PLDA scoring are trained in a unsupervised way, using the data of the collection itself. Performances are compared, using combinations of target data and external data for training. The experimental results on two distinct target cor- pora show that using data from the corpora themselves to perform unsupervised iterative training and domain adaptation of PLDA parameters can improve an existing system, trained on external annotated data. Such results indicate that perform- ing speaker indexation on small collections of unlabeled audio archives should only rely on the availability of a sufficient external corpus, which can be specifically adapted to every target collection. We show that a minimum collection size is required to exclude the use of such an external bootstrap.
Type de document :
Communication dans un congrès
Speaker and Language Recognition Workshop (Speaker Odyssey), Jun 2016, Bilbao, Spain. Odyssey 20016 Proceedings, pp.152-157, 2016
Liste complète des métadonnées

Littérature citée [19 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01433173
Contributeur : Sylvain Meignier <>
Soumis le : vendredi 24 mars 2017 - 22:59:42
Dernière modification le : mardi 19 juin 2018 - 11:50:04
Document(s) archivé(s) le : dimanche 25 juin 2017 - 12:32:14

Fichier

50.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

  • HAL Id : hal-01433173, version 1

Collections

Citation

Gaël Le Lan, Sylvain Meignier, Delphine Charlet, Anthony Larcher. First investigations on self trained speaker diarization. Speaker and Language Recognition Workshop (Speaker Odyssey), Jun 2016, Bilbao, Spain. Odyssey 20016 Proceedings, pp.152-157, 2016. 〈hal-01433173〉

Partager

Métriques

Consultations de la notice

264

Téléchargements de fichiers

50