Polyphonic Sound Event Tracking using Linear Dynamical Systems

Abstract : —In this paper, a system for polyphonic sound event detection and tracking is proposed, based on spectrogram factorisation techniques and state space models. The system extends probabilistic latent component analysis (PLCA) and is modelled around a 4-dimensional spectral template dictionary of frequency, sound event class, exemplar index, and sound state. In order to jointly track multiple overlapping sound events over time, the integration of linear dynamical systems (LDS) within the PLCA inference is proposed. The system assumes that the PLCA sound event activation is the (noisy) observation in an LDS, with the latent states corresponding to the true event activations. LDS training is achieved using fully observed data, making use of ground truth-informed event activations produced by the PLCA-based model. Several LDS variants are evaluated, using polyphonic datasets of office sounds generated from an acoustic scene simulator, as well as real and synthesized monophonic datasets for comparative purposes. Results show that the integration of LDS tracking within PLCA leads to an improvement of +8.5-10.5% in terms of frame-based F-measure as compared to the use of the PLCA model alone. In addition, the proposed system outperforms several state-of-the-art methods for the task of polyphonic sound event detection.
Document type :
Journal articles
Complete list of metadatas

Cited literature [33 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01520194
Contributor : Mathieu Lagrange <>
Submitted on : Wednesday, May 10, 2017 - 9:33:19 AM
Last modification on : Friday, May 17, 2019 - 9:22:06 AM
Long-term archiving on : Friday, August 11, 2017 - 12:15:28 PM

File

taslp-plca-lds.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01520194, version 1

Collections

Citation

Emmanouil Benetos, Grégoire Lafay, Mathieu Lagrange, Mark Plumbley. Polyphonic Sound Event Tracking using Linear Dynamical Systems. IEEE Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2017, 25 (6), pp.1266-1277. ⟨hal-01520194⟩

Share

Metrics

Record views

294

Files downloads

172