Pyannote.audio: neural building blocks for speaker diarization

Hervé Bredin; Ruiqing Yin; Juan Manuel Coria; Gregory Gelly; Pavel Korshunov; Marvin Lavechin; Diego Fustes; Hadrien Titeux; Wassim Bouaziz; Marie-Philippe Gill

Communication Dans Un Congrès Année : 2020

Pyannote.audio: neural building blocks for speaker diarization

(1, 2) , (1, 3) , (1, 3) , (1) , , , , , ,

1
2
3

Hervé Bredin

Fonction : Auteur
PersonId : 15856
IdHAL : hbredin
ORCID : 0000-0002-3739-925X
IdRef : 121165779

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Traitement du Langage Parlé

Ruiqing Yin

Fonction : Auteur
PersonId : 1034326

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Information, Langue Ecrite et Signée

Juan Manuel Coria

Fonction : Auteur
PersonId : 179273
IdHAL : juan-manuel-coria

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Information, Langue Ecrite et Signée

Gregory Gelly

Fonction : Auteur
PersonId : 1034316

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Pavel Korshunov

Fonction : Auteur

Marvin Lavechin

Fonction : Auteur

Diego Fustes

Fonction : Auteur

Hadrien Titeux

Fonction : Auteur

Wassim Bouaziz

Fonction : Auteur

Marie-Philippe Gill

Fonction : Auteur

Résumé

We introduce pyannote.audio, an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines. pyannote.audio also comes with pre-trained models covering a wide range of domains for voice activity detection, speaker change detection, overlapped speech detection, and speaker embedding â reaching state-of-the-art performance for most of them.

Mots clés

speaker diarization voice activity detection speaker change detection overlapped speech detection speaker embedding

Domaines

Informatique [cs] Informatique et langage [cs.CL]

Limsi Publications : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02995345

Soumis le : lundi 9 novembre 2020-10:12:38

Dernière modification le : samedi 7 octobre 2023-21:36:20

Dates et versions

hal-02995345 , version 1 (09-11-2020)

Identifiants

HAL Id : hal-02995345 , version 1
ARXIV : 1911.01255

Citer

Hervé Bredin, Ruiqing Yin, Juan Manuel Coria, Gregory Gelly, Pavel Korshunov, et al.. Pyannote.audio: neural building blocks for speaker diarization. IEEE International Conference on Acoustics, Speech, and Signal Processing, May 2020, Barcelona, Spain. ⟨hal-02995345⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS LIMSI UNIV-PARIS-SACLAY SORBONNE-UNIVERSITE LISN GS-ENGINEERING GS-COMPUTER-SCIENCE LISN-ILES LISN-TLP

349 Consultations

0 Téléchargements

Pyannote.audio: neural building blocks for speaker diarization

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager