Universal audio synthesizer control with normalizing flows

Philippe Esling; Naotake Masuda; Adrien Bardet; Romeo Despres; Axel Chemla--Romeu-Santos

Communication Dans Un Congrès Année : 2019

Universal audio synthesizer control with normalizing flows

(1, 2) , (3) , (4) , (1) , (1, 5)

1
2
3
4
5

Philippe Esling

Fonction : Auteur
PersonId : 14916
IdHAL : philippe-esling
ORCID : 0000-0002-1655-7909
IdRef : 172472873

Sciences et Technologies de la Musique et du Son

Institut de Recherche et Coordination Acoustique/Musique

Naotake Masuda

Fonction : Auteur

Tokyo University of Science [Tokyo]

Adrien Bardet

Fonction : Auteur

Laboratoire d'Informatique de l'Université du Mans

Romeo Despres

Fonction : Auteur

Sciences et Technologies de la Musique et du Son

Axel Chemla--Romeu-Santos

Fonction : Auteur
PersonId : 182346
IdHAL : axel-chemla-romeu-santos
ORCID : 0000-0001-7329-6533
IdRef : 25312624X

Sciences et Technologies de la Musique et du Son

Università degli Studi di Milano = University of Milan

Résumé

The ubiquity of sound synthesizers has reshaped music production and even entirely defined new music genres. However, the increasing complexity and number of parameters in modern synthesizers make them harder to master. Hence, the development of methods allowing to easily create and explore with synthesizers is a crucial need. Here, we introduce a novel formulation of audio synthesizer control. We formalize it as finding an organized latent audio space that represents the capabilities of a synthesizer, while constructing an invertible mapping to the space of its parameters. By using this formulation, we show that we can address simultaneously automatic parameter inference, macro-control learning and audio-based preset exploration within a single model. To solve this new formulation, we rely on Variational Auto-Encoders (VAE) and Normalizing Flows (NF) to organize and map the respective auditory and parameter spaces. We introduce the disentangling flows, which allow to perform the invertible mapping between separate latent spaces, while steering the organization of some latent dimensions to match target variation factors by splitting the objective as partial density evaluation. We evaluate our proposal against a large set of baseline models and show its superiority in both parameter inference and audio reconstruction. We also show that the model disentangles the major factors of audio variations as latent dimensions, that can be directly used as macro-parameters. We also show that our model is able to learn semantic controls of a synthesizer by smoothly mapping to its parameters. Finally, we discuss the use of our model in creative applications and its real-time implementation in Ableton Live

Domaines

Intelligence artificielle [cs.AI] Apprentissage [cs.LG] Musique, musicologie et arts de la scène

Philippe Esling : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02471340

Soumis le : samedi 8 février 2020-03:29:24

Dernière modification le : jeudi 23 novembre 2023-14:44:05

Dates et versions

hal-02471340 , version 1 (08-02-2020)

Identifiants

HAL Id : hal-02471340 , version 1
ARXIV : 1907.00971

Citer

Philippe Esling, Naotake Masuda, Adrien Bardet, Romeo Despres, Axel Chemla--Romeu-Santos. Universal audio synthesizer control with normalizing flows. International Conference on Digital Audio Effects (DaFX 2019), Sep 2019, Birmingham, United Kingdom. ⟨hal-02471340⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS UNIV-LEMANS IRCAM STMS SORBONNE-UNIVERSITE SU-SCIENCES MUSCI

76 Consultations

0 Téléchargements

Universal audio synthesizer control with normalizing flows

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager