Automatic chord extraction and musical structure prediction through semi-supervised learning, application to human-computer improvisation

Tristan Carsault 1
1 Repmus - Représentations musicales
STMS - Sciences et Technologies de la Musique et du Son
Abstract : Human computer co-improvisation aims to rely on a computer in order to produce a musical accompaniment to a musician’s improvisation. Recently, the notion of guidance has been introduced to enhance the process of human computer co-improvisation. Although this concept has already been studied with a step-by-step guidance or by guiding with a formal temporal structure, it is usually only based on a past memory of events. This memory is derived from an annotated corpus which limits the possibility to infer the potential future improvisation structure. Nevertheless, most improvisations are based on long-term structures or grids. Our study intends to target these aspects and provide short term predictions of the musical structures to improve the quality of the computer co-improvisation. Our aim is to develop a software that interacts in real-time with a musician by inferring expected structures. In order to achieve this goal, we divide the project into two main tasks: a listening module and a symbolic generation module. The listening module extracts the musical structure played by the musician whereas the generative module predicts musical sequences based on these extractions. In this report, we present a first approach towards this goal by introducing an automatic chord extraction module and a chord label sequence generator. Regarding the structure extraction, as the current state-of-the-art results in automatic chord extraction are obtained with Convolutional Neural Networks (CNN), we first study new architectures derived from the CNNs applied to this task. However, as we underline in our study, the low quantity of audio labeled dataset could limit the use of machine learning algorithms. Hence, we also propose the use of Ladder Networks (LN) which can be trained in a semi-supervised way. This allows us to evaluate the use of unlabeled music data to improve labeled chord extraction. Regarding the chord label generator, many recent works showed the success of Recurrent Neural Networks (RNN) for generative temporal applications. Thus, we use a family of recurrent networks, the Long Short-Term Memory (LSTM) unit, for our generative task. Here, we present our implementations and the results of our models by comparing to the current state-of-the-art and show that we obtain comparable results on the seminal evaluation datasets. Finally, we introduce the overall architecture of the software linking both modules and propose some directions of future work.
Liste complète des métadonnées
Contributeur : Jérôme Nika <>
Soumis le : mercredi 19 septembre 2018 - 15:57:44
Dernière modification le : dimanche 7 octobre 2018 - 01:11:03


  • HAL Id : hal-01877337, version 1



Tristan Carsault. Automatic chord extraction and musical structure prediction through semi-supervised learning, application to human-computer improvisation. [Research Report] Ircam UMR STMS 9912. 2017. 〈hal-01877337〉



Consultations de la notice