ODESSA at Albayzin Speaker Diarization Challenge 2018

Jose Patino; Héctor Delgado; Ruiqing Yin; Hervé Bredin; Claude Barras; Nicholas Evans

Communication Dans Un Congrès Année : 2018

ODESSA at Albayzin Speaker Diarization Challenge 2018

, (1) , (2) , (2) , (2) , (1)

1
2

Jose Patino

Fonction : Auteur
PersonId : 743667
IdHAL : jose-patino
ORCID : 0000-0001-7193-0721
IdRef : 241999308

Héctor Delgado

Fonction : Auteur

Eurecom [Sophia Antipolis]

Ruiqing Yin

Fonction : Auteur

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Hervé Bredin

Fonction : Auteur
PersonId : 15856
IdHAL : hbredin
ORCID : 0000-0002-3739-925X
IdRef : 121165779

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Claude Barras

Fonction : Auteur
PersonId : 17217
IdHAL : claude-barras
IdRef : 165065583

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Nicholas Evans

Fonction : Auteur
PersonId : 938450

Eurecom [Sophia Antipolis]

Résumé

This paper describes the ODESSA submissions to the Albayzin Speaker Diarization Challenge 2018. The challenge addresses the diarization of TV shows. This work explores three different techniques to represent speech segments, namely binary key, x-vector and triplet-loss based embeddings. While training-free methods such as the binary key technique can be applied easily to a scenario where training data is limited, the training of robust neural-embedding extractors is considerably more challenging. However, when training data is plentiful (open-set condition), neural embeddings provide more robust segmentations, giving speaker representations which lead to better diarization performance. The paper also reports our efforts to improve speaker diarization performance through system combination. For systems with a common temporal resolution, fusion is performed at segment level during clustering. When the systems under fusion produce segmentations with an arbitrary resolution, they are combined at solution level. Both approaches to fusion are shown to improve diarization performance.

Domaines

Traitement du signal et de l'image [eess.SP]

Hervé Bredin : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01987808

Soumis le : lundi 21 janvier 2019-12:36:38

Dernière modification le : samedi 7 octobre 2023-21:36:20

Dates et versions

hal-01987808 , version 1 (21-01-2019)

Identifiants

HAL Id : hal-01987808 , version 1

Citer

Jose Patino, Héctor Delgado, Ruiqing Yin, Hervé Bredin, Claude Barras, et al.. ODESSA at Albayzin Speaker Diarization Challenge 2018. IberSPEECH 2018, 2018, Barcelona, Spain. pp.211--215. ⟨hal-01987808⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS EURECOM LIMSI UNIV-PARIS-SACLAY SORBONNE-UNIVERSITE ANR LISN GS-ENGINEERING GS-COMPUTER-SCIENCE

84 Consultations

0 Téléchargements

ODESSA at Albayzin Speaker Diarization Challenge 2018

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager