Semi-Supervised and Unsupervised Data Extraction Targeting Speakers: From Speaker Roles to Fame?

Abstract : Speaker identification is based on classification methods and acoustic models. Acoustic models are learned from audio data related to the speakers to be modeled. However, recording and annotating such data is time-consuming and labor-intensive. In this paper we propose to use data available on video-sharing websites like YouTube and Dailymotion to learn speaker-specific acoustic models. This process raises two questions: on the one hand, which are the speakers that can be identified through this kind of knowledge and, in the other hand, how to extract these data from such a noisy corpus that is the Web. Two approaches are considered in order to extract and to annotate the data: the first is semi-supervised and requires a human annotator to control the process, the second is totally unsupervised. Speakers models created from the proposed approaches were experimented on the REPERE 2012 TV shows test corpus. The identification results have been analyzed in terms of speaker roles and fame, which is a subjective concept introduced to estimate the ease to model speakers.
Type de document :
Communication dans un congrès
Interspeech satellite workshop on Speech, Language and Audio in Multimedia (SLAM), 2013, Marseille, France. Proceedings of the First Workshop on Speech, Language and Audio in Multimedia (SLAM),, 2013
Liste complète des métadonnées

Littérature citée [18 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01433450
Contributeur : Sylvain Meignier <>
Soumis le : samedi 1 avril 2017 - 00:52:15
Dernière modification le : jeudi 6 avril 2017 - 10:08:42
Document(s) archivé(s) le : dimanche 2 juillet 2017 - 12:13:06

Fichier

paper-14.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

  • HAL Id : hal-01433450, version 1

Collections

Citation

Carole Lailler, Grégor Dupuy, Mickael Rouvier, Sylvain Meignier. Semi-Supervised and Unsupervised Data Extraction Targeting Speakers: From Speaker Roles to Fame?. Interspeech satellite workshop on Speech, Language and Audio in Multimedia (SLAM), 2013, Marseille, France. Proceedings of the First Workshop on Speech, Language and Audio in Multimedia (SLAM),, 2013. 〈hal-01433450〉

Partager

Métriques

Consultations de la notice

103

Téléchargements de fichiers

17