Skip to Main content Skip to Navigation
Conference papers

A weakly-supervised discriminative model for audio-to-score alignment

Rémi Lajugie 1, 2 Piotr Bojanowski 1, 3 Philippe Cuvillier 4, 5 Sylvain Arlot 6 Francis Bach 1, 2
2 SIERRA - Statistical Machine Learning and Parsimony
DI-ENS - Département d'informatique de l'École normale supérieure, CNRS - Centre National de la Recherche Scientifique, Inria de Paris
3 WILLOW - Models of visual object recognition and scene understanding
Inria de Paris, DI-ENS - Département d'informatique de l'École normale supérieure
4 Repmus - Représentations musicales
STMS - Sciences et Technologies de la Musique et du Son
5 MuTant - Synchronous Realtime Processing and Programming of Music Signals
Inria de Paris, UPMC - Université Pierre et Marie Curie - Paris 6, IRCAM, CNRS - Centre National de la Recherche Scientifique
Abstract : In this paper, we consider a new discriminative approach to the problem of audio-to-score alignment. We consider the two distinct informations provided by the music scores: (i) an exact ordered list of musical events and (ii) an approximate prior information about relative duration of events. We extend the basic dynamic time warping algorithm to a convex problem that learns optimal classifiers for all events while jointly aligning files, using this weak supervision only. We show that the relative duration between events can be easily used as a penalization of our cost function and allows us to drastically improve performances of our approach. We demonstrate the validity of our approach on a large and realistic dataset.
Complete list of metadatas

Cited literature [26 references]  Display  Hide  Download
Contributor : Philippe Cuvillier <>
Submitted on : Tuesday, January 5, 2016 - 3:12:06 PM
Last modification on : Monday, February 10, 2020 - 6:13:49 PM
Document(s) archivé(s) le : Thursday, April 7, 2016 - 3:26:31 PM


Files produced by the author(s)


  • HAL Id : hal-01251018, version 1


Rémi Lajugie, Piotr Bojanowski, Philippe Cuvillier, Sylvain Arlot, Francis Bach. A weakly-supervised discriminative model for audio-to-score alignment. 41st International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Mar 2016, Shanghai, China. ⟨hal-01251018⟩



Record views


Files downloads