Skip to Main content Skip to Navigation
Journal articles

Polyphonic Sound Event Tracking using Linear Dynamical Systems

Abstract : —In this paper, a system for polyphonic sound event detection and tracking is proposed, based on spectrogram factorisation techniques and state space models. The system extends probabilistic latent component analysis (PLCA) and is modelled around a 4-dimensional spectral template dictionary of frequency, sound event class, exemplar index, and sound state. In order to jointly track multiple overlapping sound events over time, the integration of linear dynamical systems (LDS) within the PLCA inference is proposed. The system assumes that the PLCA sound event activation is the (noisy) observation in an LDS, with the latent states corresponding to the true event activations. LDS training is achieved using fully observed data, making use of ground truth-informed event activations produced by the PLCA-based model. Several LDS variants are evaluated, using polyphonic datasets of office sounds generated from an acoustic scene simulator, as well as real and synthesized monophonic datasets for comparative purposes. Results show that the integration of LDS tracking within PLCA leads to an improvement of +8.5-10.5% in terms of frame-based F-measure as compared to the use of the PLCA model alone. In addition, the proposed system outperforms several state-of-the-art methods for the task of polyphonic sound event detection.
Document type :
Journal articles
Complete list of metadata

Cited literature [33 references]  Display  Hide  Download
Contributor : Mathieu Lagrange Connect in order to contact the contributor
Submitted on : Wednesday, May 10, 2017 - 9:33:19 AM
Last modification on : Friday, August 5, 2022 - 2:54:51 PM
Long-term archiving on: : Friday, August 11, 2017 - 12:15:28 PM


Files produced by the author(s)


  • HAL Id : hal-01520194, version 1


Emmanouil Benetos, Grégoire Lafay, Mathieu Lagrange, Mark D Plumbley. Polyphonic Sound Event Tracking using Linear Dynamical Systems. IEEE Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2017, 25 (6), pp.1266-1277. ⟨hal-01520194⟩



Record views


Files downloads