SPEAKER EMBEDDINGS FOR DIARIZATION OF BROADCAST DATA IN THE ALLIES CHALLENGE

Anthony Larcher; Ambuj Mehrish; Marie Tahon; Sylvain Meignier; Jean Carrive; David Doukhan; Olivier Galibert; Nicholas Evans

Communication Dans Un Congrès Année : 2021

SPEAKER EMBEDDINGS FOR DIARIZATION OF BROADCAST DATA IN THE ALLIES CHALLENGE

(1) , (1) , (1) , (1) , (2) , (2) , (3) , (4)

1
2
3
4

Anthony Larcher

Fonction : Auteur
PersonId : 20105
IdHAL : anthony-larcher
ORCID : 0000-0003-4398-0224
IdRef : 139544569

Laboratoire d'Informatique de l'Université du Mans

Ambuj Mehrish

Fonction : Auteur

Laboratoire d'Informatique de l'Université du Mans

Marie Tahon

Fonction : Auteur
PersonId : 9821
IdHAL : marie-tahon
ORCID : 0000-0002-6782-0332
IdRef : 165065532

Laboratoire d'Informatique de l'Université du Mans

Sylvain Meignier

Fonction : Auteur

Laboratoire d'Informatique de l'Université du Mans

Jean Carrive

Fonction : Auteur
PersonId : 844613

Institut National de l'Audiovisuel

David Doukhan

Fonction : Auteur
PersonId : 1006987

Institut National de l'Audiovisuel

Olivier Galibert

Fonction : Auteur

Laboratoire National de Métrologie et d'Essais [Trappes]

Nicholas Evans

Fonction : Auteur
PersonId : 938450

Eurecom [Sophia Antipolis]

Résumé

Diarization consists in the segmentation of speech signals and the clustering of homogeneous speaker segments. State-of-the-art systems typically operate upon speaker embeddings, such as ivectors or neural x-vectors, extracted from mel cepstral coefficients (MFCCs) or spectrograms. The recent SincNet architecture extracts x-vectors directly from raw speech signals. The work reported in this paper compares the performance of different embeddings extracted from MFCCs or the raw signal for speaker diarization and broadcast media treated with compression and sub-sampling, operations which typically degrade performance. Experiments are performed with the new ALLIES database that was designed to complement existing, publicly available French corpora of broadcast radio and TV shows. Results show that, in adverse conditions, with compression and sampling mismatch, SincNet x-vectors outperform i-vectors and x-vectors by relative DERs of 43% and 73% respectively. Additionally we found that SincNet x-vectors are not the absolute best embeddings but are more robust to data mismatch than others.

Mots clés

Speaker diarization x-vectors i-vectors Sinc-Net raw signal

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

ICASSP_2021_ALLIES_submitted.pdf (152.18 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

anthony larcher : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03262914

Soumis le : mercredi 16 juin 2021-18:05:42

Dernière modification le : vendredi 19 août 2022-11:18:39

Archivage à long terme le : vendredi 17 septembre 2021-19:25:18

Dates et versions

hal-03262914 , version 1 (16-06-2021)

Identifiants

HAL Id : hal-03262914 , version 1

Citer

Anthony Larcher, Ambuj Mehrish, Marie Tahon, Sylvain Meignier, Jean Carrive, et al.. SPEAKER EMBEDDINGS FOR DIARIZATION OF BROADCAST DATA IN THE ALLIES CHALLENGE. ICASSP, Jun 2021, Toronto, Canada. ⟨hal-03262914⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LEMANS EURECOM LIUM LIUM-LST LNE ANR

131 Consultations

275 Téléchargements

SPEAKER EMBEDDINGS FOR DIARIZATION OF BROADCAST DATA IN THE ALLIES CHALLENGE

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager