Real-Time Monophonic and Polyphonic Audio Classification from Power Spectra

Maxime Baelde; Christophe Biernacki; Raphaël Greff

doi:10.1016/j.patcog.2019.03.017

Article Dans Une Revue Pattern Recognition Année : 2019

Real-Time Monophonic and Polyphonic Audio Classification from Power Spectra

(1, 2, 3) , (2) , (1)

1
2
3

Maxime Baelde

Fonction : Auteur
PersonId : 1017368

A-Volute [Roubaix]

MOdel for Data Analysis and Learning

Laboratoire Paul Painlevé - UMR 8524

Christophe Biernacki

Fonction : Auteur
PersonId : 853117

MOdel for Data Analysis and Learning

Raphaël Greff

Fonction : Auteur

A-Volute [Roubaix]

Résumé

This work addresses the recurring challenge of real-time monophonic and polyphonic audio source classification. The whole normalized power spectrum (NPS) is directly involved in the proposed process, avoiding complex and hazardous traditional feature extraction. It is also a natural candidate for polyphonic events thanks to its additive property in such cases. The classification task is performed through a nonparametric kernel-based generative modeling of the power spectrum. Advantage of this model is twofold: it is almost hypothesis free and it allows to straightforwardly obtain the maximum a posteriori classification rule of online signals. Moreover it makes use of the monophonic dataset to build the polyphonic one. Then, to reach the real-time target, the complexity of the method can be tuned by using a standard hierarchical clustering preprocessing of the prototypes, revealing a particularly efficient computation time and classification accuracy trade-off. The proposed method, called RARE (for Real-time Audio Recognition Engine) reveals encouraging results both in monophonic and polyphonic classification tasks on benchmark and owned datasets, including also the targeted real-time situation. In particular, this method benefits from several advantages compared to the state-of-the-art methods including a reduced training time, no feature extraction, the ability to control the computation - accuracy trade-off and no training on already mixed sounds for polyphonic classification.

Mots clés

Polyphonic Monophonic Audio classification Machine learning Real-time Nonparametric estimation Generative model

Domaines

Machine Learning [stat.ML] Traitement du signal et de l'image [eess.SP]

Fichier principal

article_v3.pdf (481.51 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Maxime Baelde : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01834221

Soumis le : lundi 11 mars 2019-08:37:04

Dernière modification le : vendredi 19 avril 2024-14:04:05

Archivage à long terme le : mercredi 12 juin 2019-12:49:06

Dates et versions

hal-01834221 , version 1 (10-07-2018)

hal-01834221 , version 2 (14-01-2019)

hal-01834221 , version 3 (11-03-2019)

Identifiants

HAL Id : hal-01834221 , version 3
DOI : 10.1016/j.patcog.2019.03.017

Citer

Maxime Baelde, Christophe Biernacki, Raphaël Greff. Real-Time Monophonic and Polyphonic Audio Classification from Power Spectra. Pattern Recognition, 2019, 92, pp.82-92. ⟨10.1016/j.patcog.2019.03.017⟩. ⟨hal-01834221v3⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA INRIA2 UNIV-LILLE LPP-MATH

351 Consultations

999 Téléchargements

Real-Time Monophonic and Polyphonic Audio Classification from Power Spectra

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager