Semi-Supervised and Unsupervised Data Extraction Targeting Speakers: From Speaker Roles to Fame?

Carole Lailler; Grégor Dupuy; Mickael Rouvier; Sylvain Meignier

Communication Dans Un Congrès Année : 2013

Semi-Supervised and Unsupervised Data Extraction Targeting Speakers: From Speaker Roles to Fame?

(1) , (1) , (1) , (1)

Carole Lailler

Fonction : Auteur
PersonId : 766467
IdRef : 161865755

Laboratoire d'Informatique de l'Université du Mans

Grégor Dupuy

Fonction : Auteur
PersonId : 776540
IdRef : 188635548

Laboratoire d'Informatique de l'Université du Mans

Mickael Rouvier

Fonction : Auteur

Laboratoire d'Informatique de l'Université du Mans

Sylvain Meignier

Fonction : Auteur
PersonId : 11674
IdHAL : sylvain-meignier
ORCID : 0000-0001-7687-073X
IdRef : 182269086

Laboratoire d'Informatique de l'Université du Mans

Résumé

Speaker identification is based on classification methods and acoustic models. Acoustic models are learned from audio data related to the speakers to be modeled. However, recording and annotating such data is time-consuming and labor-intensive. In this paper we propose to use data available on video-sharing websites like YouTube and Dailymotion to learn speaker-specific acoustic models. This process raises two questions: on the one hand, which are the speakers that can be identified through this kind of knowledge and, in the other hand, how to extract these data from such a noisy corpus that is the Web. Two approaches are considered in order to extract and to annotate the data: the first is semi-supervised and requires a human annotator to control the process, the second is totally unsupervised. Speakers models created from the proposed approaches were experimented on the REPERE 2012 TV shows test corpus. The identification results have been analyzed in terms of speaker roles and fame, which is a subjective concept introduced to estimate the ease to model speakers.

Mots clés

speaker identification JFA semi-and unsuper- vised speaker modeling speaker roles fame

Domaines

Informatique et langage [cs.CL]

Fichier principal

paper-14.pdf (3.57 Mo)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

sylvain meignier : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01433450

Soumis le : samedi 1 avril 2017-00:52:15

Dernière modification le : mardi 8 décembre 2020-09:44:18

Archivage à long terme le : dimanche 2 juillet 2017-12:13:06

Dates et versions

hal-01433450 , version 1 (01-04-2017)

Identifiants

HAL Id : hal-01433450 , version 1

Citer

Carole Lailler, Grégor Dupuy, Mickael Rouvier, Sylvain Meignier. Semi-Supervised and Unsupervised Data Extraction Targeting Speakers: From Speaker Roles to Fame?. Interspeech satellite workshop on Speech, Language and Audio in Multimedia (SLAM), 2013, Marseille, France. ⟨hal-01433450⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LEMANS LIUM LIUM-LST ANR

136 Consultations

28 Téléchargements

Semi-Supervised and Unsupervised Data Extraction Targeting Speakers: From Speaker Roles to Fame?

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager