Spectral and Cepstral Audio Noise Reduction Techniques in Speech Emotion Recognition

Abstract : Signal noise reduction can improve the performance of machine learning systems dealing with time signals such as audio. Real-life applicability of these recognition technologies requires the system to uphold its performance level in variable, challenging conditions such as noisy environments. In this contribution, we investigate audio signal denoising methods in cepstral and log-spectral domains and compare them with common implementations of standard techniques. The different approaches are first compared generally using averaged acoustic distance metrics. They are then applied to automatic recognition of spontaneous and natural emotions under simulated smartphone-recorded noisy conditions. Emotion recognition is implemented as support vector regression for continuous-valued prediction of arousal and valence on a realistic multimodal database. In the experiments, the proposed methods are found to generally outperform standard noise reduction algorithms.
Liste complète des métadonnées

Cited literature [30 references]  Display  Hide  Download

Contributor : Fabien Ringeval <>
Submitted on : Wednesday, March 22, 2017 - 4:11:01 PM
Last modification on : Tuesday, February 12, 2019 - 1:31:18 AM
Document(s) archivé(s) le : Friday, June 23, 2017 - 1:49:15 PM


Files produced by the author(s)


Public Domain




Jouni Pohjalainen, Fabien Ringeval, Zixing Zhang, Björn Schuller. Spectral and Cepstral Audio Noise Reduction Techniques in Speech Emotion Recognition. Proceedings of the 24th ACM International Conference on Multimedia (ACM MM), 2016, Amsterdam, Netherlands. pp.670 - 674, ⟨10.1145/2964284.2967306⟩. ⟨hal-01494062⟩



Record views


Files downloads