Cross-Modal Variational Inference For Bijective Signal-Symbol Translation

Axel Chemla-Romeu-Santos; Stavros Ntalampiras; Philippe Esling; Goffredo Haus; Gérard Assayag

Article Dans Une Revue Proceedings of the 22 nd International Conference on Digital Audio Effects (DAFx-19) Année : 2019

Cross-Modal Variational Inference For Bijective Signal-Symbol Translation

(1, 2) , (2) , (1) , (2) , (1)

1
2

Axel Chemla-Romeu-Santos

Fonction : Auteur
PersonId : 182346
IdHAL : axel-chemla-romeu-santos
ORCID : 0000-0001-7329-6533
IdRef : 25312624X

Représentations musicales

Laboratorio d'Informatica Musicale

Stavros Ntalampiras

Fonction : Auteur

Laboratorio d'Informatica Musicale

Philippe Esling

Fonction : Auteur
PersonId : 14916
IdHAL : philippe-esling
ORCID : 0000-0002-1655-7909
IdRef : 172472873

Représentations musicales

Goffredo Haus

Fonction : Auteur
PersonId : 1064507

Laboratorio d'Informatica Musicale

Gérard Assayag

Fonction : Auteur
PersonId : 1501
IdHAL : gerard-assayag
ORCID : 0000-0002-4427-7373
IdRef : 069359326

Représentations musicales

Résumé

Extraction of symbolic information from signals is an active field of research enabling numerous applications especially in the Musical Information Retrieval domain. This complex task, that is also related to other topics such as pitch extraction or instrument recognition, is a demanding subject that gave birth to numerous approaches , mostly based on advanced signal processing-based algorithms. However, these techniques are often non-generic, allowing the extraction of definite physical properties of the signal (pitch, octave), but not allowing arbitrary vocabularies or more general annotations. On top of that, these techniques are one-sided, meaning that they can extract symbolic data from an audio signal, but cannot perform the reverse process and make symbol-to-signal generation. In this paper, we propose an bijective approach for signal/symbol translation by turning this problem into a density estimation task over signal and symbolic domains, considered both as related random variables. We estimate this joint distribution with two different variational auto-encoders, one for each domain, whose inner representations are forced to match with an additive constraint, allowing both models to learn and generate separately while allowing signal-to-symbol and symbol-to-signal inference. In this article, we test our models on pitch, octave and dynamics symbols, which comprise a fundamental step towards music transcription and label-constrained audio generation. In addition to its versatility, this system is rather light during training and generation while allowing several interesting creative uses that we outline at the end of the article.

Domaines

Informatique [cs] Intelligence artificielle [cs.AI] Apprentissage [cs.LG] Traitement du signal et de l'image [eess.SP]

Fichier principal

DAFx2019_paper_36.pdf (1.5 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Axel Chemla--Romeu-Santos : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02471810

Soumis le : dimanche 9 février 2020-14:22:31

Dernière modification le : jeudi 23 novembre 2023-14:44:05

Archivage à long terme le : dimanche 10 mai 2020-12:43:40

Dates et versions

hal-02471810 , version 1 (09-02-2020)

Identifiants

HAL Id : hal-02471810 , version 1

Citer

Axel Chemla-Romeu-Santos, Stavros Ntalampiras, Philippe Esling, Goffredo Haus, Gérard Assayag. Cross-Modal Variational Inference For Bijective Signal-Symbol Translation. Proceedings of the 22 nd International Conference on Digital Audio Effects (DAFx-19), 2019. ⟨hal-02471810⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS IRCAM STMS SORBONNE-UNIVERSITE SU-SCIENCES

129 Consultations

67 Téléchargements

Cross-Modal Variational Inference For Bijective Signal-Symbol Translation

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager