Fusion methods for speech enhancement and audio source separation

Xabier Jaureguiberry; Emmanuel Vincent; Gael Richard

doi:10.1109/TASLP.2016.2553441

Article Dans Une Revue IEEE Transactions on Audio, Speech and Language Processing Année : 2016

Fusion methods for speech enhancement and audio source separation

Méthodes de fusion pour le rehaussement de la parole et la séparation de source audio

(1, 2) , (3) , (1, 2)

1
2
3

Xabier Jaureguiberry

Fonction : Auteur
PersonId : 4932
IdHAL : xabierj
IdRef : 194155412

Laboratoire Traitement et Communication de l'Information

Département Traitement du Signal et des Images

Emmanuel Vincent

Fonction : Auteur
PersonId : 1256
IdHAL : emmanuelv
ORCID : 0000-0002-0183-7289
IdRef : 089360176

Speech Modeling for Facilitating Oral-Based Communication

Gael Richard

Fonction : Auteur
PersonId : 14146
IdHAL : gael-richard
IdRef : 094977208

Laboratoire Traitement et Communication de l'Information

Département Traitement du Signal et des Images

Résumé

A wide variety of audio source separation techniques exist and can already tackle many challenging industrial issues. However, in contrast with other application domains, fusion principles were rarely investigated in audio source separation despite their demonstrated potential in classification tasks. In this paper, we propose a general fusion framework which takes advantage of the diversity of existing separation techniques in order to improve separation quality. We obtain new source estimates by summing the individual estimates given by different separation techniques weighted by a set of fusion coefficients. We investigate three alternative fusion methods which are based on standard non-linear optimization, Bayesian model averaging or deep neural networks. Experiments conducted for both speech enhancement and singing voice extraction demonstrate that all the proposed methods outperform traditional model selection. The use of deep neural networks for the estimation of time-varying coefficients notably leads to large quality improvements, up to 3 dB in terms of signal-to-distortion ratio (SDR) compared to model selection.

Mots clés

Singing Voice Extraction Fusion Speech Enhancement Deep Neural Networks Aggregation Ensemble Deep Learning Variational Bayes Audio Source Separation Model Averaging Non-Negative Matrix Factorization

Domaines

Traitement du signal et de l'image [eess.SP]

Fichier principal

taslp16.pdf (877.07 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Xabier Jaureguiberry : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01120685

Soumis le : samedi 9 avril 2016-11:39:20

Dernière modification le : jeudi 1 février 2024-10:06:13

Archivage à long terme le : mardi 15 novembre 2016-00:14:14

Dates et versions

hal-01120685 , version 1 (04-03-2015)

hal-01120685 , version 2 (20-10-2015)

hal-01120685 , version 3 (20-02-2016)

hal-01120685 , version 4 (09-04-2016)

Identifiants

HAL Id : hal-01120685 , version 4
DOI : 10.1109/TASLP.2016.2553441

Citer

Xabier Jaureguiberry, Emmanuel Vincent, Gael Richard. Fusion methods for speech enhancement and audio source separation. IEEE Transactions on Audio, Speech and Language Processing, 2016, ⟨10.1109/TASLP.2016.2553441⟩. ⟨hal-01120685v4⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM UNIV-RENNES1 CNRS INRIA IRISA PARISTECH GRID5000 UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD UR1-MATH-STIC UNIV-PARIS-SACLAY UR1-UFR-ISTIC UNIV-RENNES LTCI IDS S2A SILECS UR1-MATH-NUM

695 Consultations

1364 Téléchargements

Fusion methods for speech enhancement and audio source separation

Méthodes de fusion pour le rehaussement de la parole et la séparation de source audio

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager