Segmentation et Regroupement en Locuteur pour le traitement incrémental des collections volumineuses

Grégor Dupuy; Sylvain Meignier; Yannick Estève

Communication Dans Un Congrès Année : 2012

Cross-show Speaker Diarization to Incrementally Process Large Volume of Data

Segmentation et Regroupement en Locuteur pour le traitement incrémental des collections volumineuses

(1) , (1) , (1)

Grégor Dupuy

Fonction : Auteur
PersonId : 776540
IdRef : 188635548

Laboratoire d'Informatique de l'Université du Mans

Sylvain Meignier

Fonction : Auteur
PersonId : 11674
IdHAL : sylvain-meignier
ORCID : 0000-0001-7687-073X
IdRef : 182269086

Laboratoire d'Informatique de l'Université du Mans

Yannick Estève

Fonction : Auteur
PersonId : 11645
IdHAL : yannick-esteve
ORCID : 0000-0002-3656-8883
IdRef : 070531668

Laboratoire d'Informatique de l'Université du Mans

Résumé

Current cross-show diarization systems are mainly based on an overall clustering process that handles collectively each show of a collection. This approach has already been studied in various situations and seems to be the best way so far to achieve low error rates. Nevertheless, that process shows its limits in a realistic applicative context where large and dynamically increasing collections have to be processed. In this paper we investigate the use of an incremental clustering cross-show speaker diarization architecture to iteratively process new shows within an existing collection. The new shows to be inserted are processed one after another, according to the chronological order of broadcasting. Experiments were conducted on the LCP and the BFMTV show recordings distributed among the ETAPE and the REPERE French evaluation campaigns. It represents 67 hours of annotated data, distributed among 310 shows, and covering a 2-years period (from Sept. 2010 to Oct. 2012).

Les systèmes de Segmentation et Regroupement en Locuteurs cross-show actuels reposent princi-palement sur un processus de regroupement global qui traite collectivement chaque émission d'une collection. Cette approche a déjà été étudiée dans diverses situations et semble être le meilleur moyen à ce jour pour atteindre des taux d'erreur satisfaisants, dans une durée de traitement raisonnable. Néanmoins, ce processus montre ses limites dans un contexte applicatif réaliste où de grandes et dynamiques collections doivent être traitées. Dans cet article, nous étudions l'utilisation d'un regroupement cross-show incrémental pour traiter de manière itérative des émissions devant être insérées dans une collection existante. Les nouvelles émissions à insérer sont traitées les unes après les autres, selon l'ordre chronologique de diffusion. Les expériences ont été menées sur les enregistrements LCP et BFMTV distribués au cours des campagnes d'éva-luation françaises ETAPE et REPERE. L'ensemble représente 67 heures de données annotées, réparties sur 310 enregistrements, couvrant une période d'environ deux ans (de septembre 2010 à octobre 2012). ABSTRACT Cross-show Speaker Diarization to Incrementally Process Large Volume of Data Current cross-show diarization systems are mainly based on an overall clustering process that handles collectively each show of a collection. This approach has already been studied in various situations and seems to be the best way so far to achieve low error rates. Nevertheless, that process shows its limits in a realistic applicative context where large and dynamically increasing collections have to be processed. In this paper we investigate the use of an incremental clustering cross-show speaker diarization architecture to iteratively process new shows within an existing collection. The new shows to be inserted are processed one after another, according to the chronological order of broadcasting. Experiments were conducted on the LCP and the BFMTV show recordings distributed among the ETAPE and the REPERE French evaluation campaigns. It represents 67 hours of annotated data, distributed among 310 shows, and covering a 2-years period (from Sept. 2010 to Oct. 2012). MOTS-CLÉS : SRL, architecture incrémentale, regroupement PLNE global, i-vecteurs.

Mots clés

speaker diarization incremental architecture cross-show ILP clustering i-vectors

Domaines

Informatique et langage [cs.CL]

Fichier principal

42.pdf (240.52 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

sylvain meignier : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01433245

Soumis le : vendredi 7 avril 2017-09:12:01

Dernière modification le : mardi 8 décembre 2020-09:44:18

Archivage à long terme le : samedi 8 juillet 2017-12:27:47

Dates et versions

hal-01433245 , version 1 (07-04-2017)

Identifiants

HAL Id : hal-01433245 , version 1

Citer

Grégor Dupuy, Sylvain Meignier, Yannick Estève. Segmentation et Regroupement en Locuteur pour le traitement incrémental des collections volumineuses. 30e Journées d’Études sur la Parole (JEP'14), 2014, Le Mans, France. pp.433 - 440. ⟨hal-01433245⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LEMANS LIUM LIUM-LST ANR

145 Consultations

79 Téléchargements

Cross-show Speaker Diarization to Incrementally Process Large Volume of Data

Segmentation et Regroupement en Locuteur pour le traitement incrémental des collections volumineuses

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager