Real-Time Audio-to-Score Alignment of Singing Voice Based on Melody and Lyric Information

Rong Gong; Philippe Cuvillier; Nicolas Obin; Arshia Cont

Communication Dans Un Congrès Année : 2015

Real-Time Audio-to-Score Alignment of Singing Voice Based on Melody and Lyric Information

(1) , (1) , (2) , (3, 1)

1
2
3

Rong Gong

Fonction : Auteur

Synchronous Realtime Processing and Programming of Music Signals

Philippe Cuvillier

Fonction : Auteur
PersonId : 5616
IdHAL : philippe-cuvillier
IdRef : 200552538

Synchronous Realtime Processing and Programming of Music Signals

Nicolas Obin

Fonction : Auteur
PersonId : 7042
IdHAL : nicolas-obin
ORCID : 0000-0002-5236-5306
IdRef : 157523799

Analyse et synthèse sonores [Paris]

Arshia Cont

Fonction : Auteur
PersonId : 6067
IdHAL : arshiacont
ORCID : 0000-0002-7352-7212
IdRef : 131109758

Représentations musicales

Synchronous Realtime Processing and Programming of Music Signals

Résumé

Singing voice is specific in music: a vocal performance conveys both music (melody/pitch) and lyrics (text/phoneme) content. This paper aims at exploiting the advantages of melody and lyric information for real-time audio-to-score alignment of singing voice. First, lyrics are added as a separate observation stream into a template-based hidden semi-Markov model (HSMM), whose observation model is based on the construction of vowel templates. Second, early and late fusion of melody and lyric information are processed during real-time audio-to-score alignment. An experiment conducted with two professional singers (male/female) shows that the performance of a lyrics-based system is comparable to that of melody-based score following systems. Furthermore, late fusion of melody and lyric information substantially improves the alignment performance. Finally, maximum a posteriori adaptation (MAP) of the vowel templates from one singer to the other suggests that lyric information can be efficiently used for any singer.

Mots clés

Index Terms: singing voice real-time audio-to-score alignment lyrics spectral envelope information fusion singer adaptation

Domaines

Traitement du signal et de l'image [eess.SP] Machine Learning [stat.ML] Son [cs.SD]

Fichier principal

index.pdf (330.53 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Nicolas Obin : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01164550

Soumis le : mercredi 17 juin 2015-11:17:41

Dernière modification le : vendredi 24 mars 2023-14:53:00

Archivage à long terme le : mardi 25 avril 2017-11:09:47

Dates et versions

hal-01164550 , version 1 (17-06-2015)

Identifiants

HAL Id : hal-01164550 , version 1

Citer

Rong Gong, Philippe Cuvillier, Nicolas Obin, Arshia Cont. Real-Time Audio-to-Score Alignment of Singing Voice Based on Melody and Lyric Information. Interspeech, Sep 2015, Dresde, Germany. ⟨hal-01164550⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UPMC CNRS INRIA IRCAM STMS INRIA2 SORBONNE-UNIVERSITE SU-SCIENCES ANR

411 Consultations

772 Téléchargements

Real-Time Audio-to-Score Alignment of Singing Voice Based on Melody and Lyric Information

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager