Automatic chord extraction and musical structure prediction through semi-supervised learning, application to human-computer improvisation

Tristan Carsault

Rapport (Rapport De Recherche) Année : 2017

Automatic chord extraction and musical structure prediction through semi-supervised learning, application to human-computer improvisation

(1)

Tristan Carsault

Fonction : Auteur

Représentations musicales

Résumé

Human computer co-improvisation aims to rely on a computer in order to produce a musical accompaniment to a musician’s improvisation. Recently, the notion of guidance has been introduced to enhance the process of human computer co-improvisation. Although this concept has already been studied with a step-by-step guidance or by guiding with a formal temporal structure, it is usually only based on a past memory of events. This memory is derived from an annotated corpus which limits the possibility to infer the potential future improvisation structure. Nevertheless, most improvisations are based on long-term structures or grids. Our study intends to target these aspects and provide short term predictions of the musical structures to improve the quality of the computer co-improvisation. Our aim is to develop a software that interacts in real-time with a musician by inferring expected structures. In order to achieve this goal, we divide the project into two main tasks: a listening module and a symbolic generation module. The listening module extracts the musical structure played by the musician whereas the generative module predicts musical sequences based on these extractions. In this report, we present a first approach towards this goal by introducing an automatic chord extraction module and a chord label sequence generator. Regarding the structure extraction, as the current state-of-the-art results in automatic chord extraction are obtained with Convolutional Neural Networks (CNN), we first study new architectures derived from the CNNs applied to this task. However, as we underline in our study, the low quantity of audio labeled dataset could limit the use of machine learning algorithms. Hence, we also propose the use of Ladder Networks (LN) which can be trained in a semi-supervised way. This allows us to evaluate the use of unlabeled music data to improve labeled chord extraction. Regarding the chord label generator, many recent works showed the success of Recurrent Neural Networks (RNN) for generative temporal applications. Thus, we use a family of recurrent networks, the Long Short-Term Memory (LSTM) unit, for our generative task. Here, we present our implementations and the results of our models by comparing to the current state-of-the-art and show that we obtain comparable results on the seminal evaluation datasets. Finally, we introduce the overall architecture of the software linking both modules and propose some directions of future work.

Domaines

Apprentissage [cs.LG] Traitement du signal et de l'image [eess.SP] Son [cs.SD] Intelligence artificielle [cs.AI] Algorithme et structure de données [cs.DS] Multimédia [cs.MM] Autre [cs.OH] Musique, musicologie et arts de la scène

Jérôme Nika : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01877337

Soumis le : mercredi 19 septembre 2018-15:57:44

Dernière modification le : samedi 7 octobre 2023-21:36:22

Dates et versions

Identifiants

HAL Id : hal-01877337 , version 1

Citer

Tristan Carsault. Automatic chord extraction and musical structure prediction through semi-supervised learning, application to human-computer improvisation. [Research Report] Ircam UMR STMS 9912. 2017. ⟨hal-01877337⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS IRCAM STMS LARA SORBONNE-UNIVERSITE SU-SCIENCES ANR MUSCI

112 Consultations

0 Téléchargements

Automatic chord extraction and musical structure prediction through semi-supervised learning, application to human-computer improvisation

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager