Real-Time Audio-to-Score Alignment of Singing Voice Based on Melody and Lyric Information

Rong Gong 1 Philippe Cuvillier 1 Nicolas Obin 2 Arshia Cont 3, 1
1 MuTant - Synchronous Realtime Processing and Programming of Music Signals
Inria Paris-Rocquencourt, UPMC - Université Pierre et Marie Curie - Paris 6, IRCAM, CNRS - Centre National de la Recherche Scientifique
2 Analyse et synthèse sonores [Paris]
STMS - Sciences et Technologies de la Musique et du Son
3 Repmus - Représentations musicales
STMS - Sciences et Technologies de la Musique et du Son
Abstract : Singing voice is specific in music: a vocal performance conveys both music (melody/pitch) and lyrics (text/phoneme) content. This paper aims at exploiting the advantages of melody and lyric information for real-time audio-to-score alignment of singing voice. First, lyrics are added as a separate observation stream into a template-based hidden semi-Markov model (HSMM), whose observation model is based on the construction of vowel templates. Second, early and late fusion of melody and lyric information are processed during real-time audio-to-score alignment. An experiment conducted with two professional singers (male/female) shows that the performance of a lyrics-based system is comparable to that of melody-based score following systems. Furthermore, late fusion of melody and lyric information substantially improves the alignment performance. Finally, maximum a posteriori adaptation (MAP) of the vowel templates from one singer to the other suggests that lyric information can be efficiently used for any singer.
Type de document :
Communication dans un congrès
Interspeech, Sep 2015, Dresde, Germany
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01164550
Contributeur : Nicolas Obin <>
Soumis le : mercredi 17 juin 2015 - 11:17:41
Dernière modification le : jeudi 11 janvier 2018 - 06:27:23
Document(s) archivé(s) le : mardi 25 avril 2017 - 11:09:47

Fichier

index.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01164550, version 1

Collections

Citation

Rong Gong, Philippe Cuvillier, Nicolas Obin, Arshia Cont. Real-Time Audio-to-Score Alignment of Singing Voice Based on Melody and Lyric Information. Interspeech, Sep 2015, Dresde, Germany. 〈hal-01164550〉

Partager

Métriques

Consultations de la notice

405

Téléchargements de fichiers

294