Fusion Methods for Audio Source Separation

Xabier Jaureguiberry; Emmanuel M. Vincent; Gael Richard

Rapport (Rapport De Recherche) Année : 2014

Fusion Methods for Audio Source Separation

Méthodes de fusion pour la séparation de source audio

(1, 2) , (3) , (1, 2)

1
2
3

Xabier Jaureguiberry

Fonction : Auteur
PersonId : 4932
IdHAL : xabierj
IdRef : 194155412

Laboratoire Traitement et Communication de l'Information

Département Traitement du Signal et des Images

Emmanuel M. Vincent

Fonction : Auteur
PersonId : 1256
IdHAL : emmanuelv
ORCID : 0000-0002-0183-7289
IdRef : 089360176

Analysis, perception and recognition of speech

Gael Richard

Fonction : Auteur
PersonId : 14146
IdHAL : gael-richard
IdRef : 094977208

Laboratoire Traitement et Communication de l'Information

Département Traitement du Signal et des Images

Résumé

A wide variety of audio source separation techniques exists and can already tackle many challenging industrial issues. However, by contrast to other application domains, fusion principles were rarely investigated in audio source separation despite their demonstrated potential in classification tasks. In this paper, we propose a general fusion framework which takes advantage of the diversity of existing separation techniques in order to improve separation quality. Our approaches aim at obtaining a new source estimate by summing the individual estimates given by different separation techniques weighted by a set of fusion coefficients. We investigate three alternative fusion methods which are based on standard non-linear optimization, Bayesian model averaging or deep neural networks. Experiments conducted on both speech enhancement and singing-voice extraction demonstrate that the proposed methods lead to diverse separation performance, yet all outperform traditional model selection. The use of deep neural networks for the estimation of time-varying coefficients notably leads to great quality improvements, up to +3.3 dB in terms of signal-to-distortion ratio (SDR) compared to model selection. As such, our fusion framework is a practical and efficient way to get rid of the need to choose and carefully tune a separation system and it further allows the adaptation of existing techniques to given separation problems and objectives.

Mots clés

Deep Learning Variational Bayes Model Averaging Non-Negative Matrix Factorization Speech Enhancement Singing Voice Extraction Fusion Audio Source Separation Deep Neural Networks Aggregation Ensemble

Domaines

Traitement du signal et de l'image [eess.SP] Machine Learning [stat.ML] Réseau de neurones [cs.NE]

Fichier principal

journal14_vs.pdf (335.23 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Xabier Jaureguiberry : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01120685

Soumis le : mercredi 4 mars 2015-11:19:05

Dernière modification le : lundi 9 octobre 2023-12:49:39

Archivage à long terme le : vendredi 5 juin 2015-10:06:39

Dates et versions

hal-01120685 , version 1 (04-03-2015)

hal-01120685 , version 2 (20-10-2015)

hal-01120685 , version 3 (20-02-2016)

hal-01120685 , version 4 (09-04-2016)

Identifiants

HAL Id : hal-01120685 , version 1

Citer

Xabier Jaureguiberry, Emmanuel M. Vincent, Gael Richard. Fusion Methods for Audio Source Separation. [Research Report] Télécom ParisTech; Inria Nancy, équipe Multispeech. 2014. ⟨hal-01120685v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

697 Consultations

1364 Téléchargements

Fusion Methods for Audio Source Separation

Méthodes de fusion pour la séparation de source audio

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Partager