Skip to Main content Skip to Navigation

Modeling the neural network responsible for song learning

Silvia Pagliarini 1, 2, 3
2 Mnemosyne - Mnemonic Synergy
LaBRI - Laboratoire Bordelais de Recherche en Informatique, Inria Bordeaux - Sud-Ouest, IMN - Institut des Maladies Neurodégénératives [Bordeaux]
Abstract : During the first period of their life, babies and juvenile birds show comparable phases of vocal development: first, they listen to their parents/tutors in order to build a neural representation of the experienced auditory stimulus, then they start to produce sound and progressively get closer to reproducing their tutor song. This phase of learning is called the sensorimotor phase and is characterized by the presence of babbling, in babies, and subsong, in birds. It ends when the song crystallizes and becomes similar to the one produced by the adults.It is possible to find analogies between brain pathways responsible for sensorimotor learning in humans and birds: a vocal production pathway involves direct projections from auditory areas to motor neurons, and a vocal learning pathway is responsible for imitation and plasticity. The behavioral studies and the neuroanatomical structure of the vocal control circuit in humans and birds provide the basis for bio-inspired models of vocal learning.In particular, birds have brain circuits exclusively dedicated to song learning, making them an ideal model for exploring the representation of vocal learning by imitation of tutors.This thesis aims to build a vocal learning model underlying song learning in birds. An extensive review of the existing literature is discussed in the thesis: many previous studies have attempted to implement imitative learning in computational models and share a common structure. These learning architectures include the learning mechanisms and, eventually, exploration and evaluation strategies. A motor control function enables sound production and sensory response models either how sound is perceived or how it shapes the reward. The inputs and outputs of these functions lie (1)~in the motor space (motor parameters’ space), (2)~in the sensory space (real sounds) and (3)~either in the perceptual space (a low dimensional representation of the sound) or in the internal representation of goals (a non-perceptual representation of the target sound).The first model proposed in this thesis is a theoretical inverse model based on a simplified vocal learning model where the sensory space coincides with the motor space (i.e., there is no sound production). Such a simplification allows us to investigate how to introduce biological assumptions (e.g. non-linearity response) into a vocal learning model and which parameters influence the computational power of the model the most. The influence of the sharpness of auditory selectivity and the motor dimension are discussed.To have a complete model (which is able to perceive and produce sound), we needed a motor control function capable of reproducing sounds similar to real data (e.g. recordings of adult canaries). We analyzed the capability of WaveGAN (a Generative Adversarial Network) to provide a generator model able to produce realistic canary songs. In this generator model, the input space becomes the latent space after training and allows the representation of a high-dimensional dataset in a lower-dimensional manifold. We obtained realistic canary sounds using only three dimensions for the latent space. Among other results, quantitative and qualitative analyses demonstrate the interpolation abilities of the model, which suggests that the generator model we studied can be used as a motor function in a vocal learning model.The second version of the sensorimotor model is a complete vocal learning model with a full action-perception loop (i.e., it includes motor space, sensory space, and perceptual space). The sound production is performed by the GAN generator previously obtained. A recurrent neural network classifying syllables serves as the perceptual sensory response. Similar to the first model, the mapping between the perceptual space and the motor space is learned via an inverse model. Preliminary results show the influence of the learning rate when different sensory response functions are implemented.
Complete list of metadata
Contributor : Abes Star :  Contact
Submitted on : Wednesday, May 5, 2021 - 10:25:13 AM
Last modification on : Wednesday, May 12, 2021 - 3:09:56 AM


Version validated by the jury (STAR)


  • HAL Id : tel-03217834, version 1


Silvia Pagliarini. Modeling the neural network responsible for song learning. Modeling and Simulation. Université de Bordeaux, 2021. English. ⟨NNT : 2021BORD0107⟩. ⟨tel-03217834⟩



Record views


Files downloads