Improving singing voice separation using Deep U-Net and Wave-U-Net with data augmentation

Alice Cohen-Hadria; Axel Roebel; Geoffroy Peeters

Pré-Publication, Document De Travail Année : 2019

Improving singing voice separation using Deep U-Net and Wave-U-Net with data augmentation

(1) , (1) , (2, 3, 4)

1
2
3
4

Alice Cohen-Hadria

Fonction : Auteur
PersonId : 999469

Analyse et synthèse sonores [Paris]

Axel Roebel

Fonction : Auteur
PersonId : 4527
IdHAL : axel-roebel
ORCID : 0000-0001-6136-4391
IdRef : 227186079

Analyse et synthèse sonores [Paris]

Geoffroy Peeters

Fonction : Auteur
PersonId : 6738
IdHAL : geoffroy-peeters
ORCID : 0000-0001-5255-3019
IdRef : 187470472

Institut Polytechnique de Paris

Laboratoire Traitement et Communication de l'Information

Signal, Statistique et Apprentissage

Résumé

State-of-the-art singing voice separation is based on deep learning making use of CNN structures with skip connections (like U-net model, Wave-U-Net model, or MSDENSELSTM). A key to the success of these models is the availability of a large amount of training data. In the following study, we are interested in singing voice separation for mono signals and will investigate into comparing the U-Net and the Wave-U-Net that are structurally similar, but work on different input representations. First, we report a few results on variations of the U-Net model. Second, we will discuss the potential of state of the art speech and music transformation algorithms for augmentation of existing data sets and demonstrate that the effect of these augmentations depends on the signal representations used by the model. The results demonstrate a considerable improvement due to the augmentation for both models. But pitch transposition is the most effective augmentation strategy for the U-Net model, while transposition, time stretching, and formant shifting have a much more balanced effect on the Wave-U-Net model. Finally, we compare the two models on the same dataset.

Domaines

Son [cs.SD] Traitement du signal et de l'image [eess.SP]

Axel Roebel : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02457072

Soumis le : lundi 27 janvier 2020-17:47:59

Dernière modification le : samedi 7 octobre 2023-21:36:22

Dates et versions

hal-02457072 , version 1 (27-01-2020)

Identifiants

HAL Id : hal-02457072 , version 1
ARXIV : 1903.01415

Citer

Alice Cohen-Hadria, Axel Roebel, Geoffroy Peeters. Improving singing voice separation using Deep U-Net and Wave-U-Net with data augmentation. 2019. ⟨hal-02457072⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM CNRS PARISTECH IRCAM STMS SORBONNE-UNIVERSITE LTCI IDS S2A SU-SCIENCES

77 Consultations

0 Téléchargements

Improving singing voice separation using Deep U-Net and Wave-U-Net with data augmentation

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager