Supervised Representation Learning for Audio Scene Classification - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue IEEE/ACM Transactions on Audio, Speech and Language Processing Année : 2017

Supervised Representation Learning for Audio Scene Classification

A Rakotomamonjy

Résumé

This paper investigates the use of supervised feature learning approaches for extracting relevant and discriminative features from acoustic scene recordings. Owing to the recent release of open datasets for acoustic scene classification (ASC) problems, representation learning techniques can now be envisioned for solving the problem of feature extraction. This paper makes a step towards this goal by first studying models based on convolutional neural networks (ConvNet). Because the scale of the datasets available is still small compared to those available in computer vision, we also introduce a technical contribution denoted as supervised non-negative matrix factorization (SNMF). Our goal through this SNMF is to induce the matrix decomposition to carry out discriminative information in addition to the usual generative ones. We achieve this objective by augmenting the NMF optimization problem with a novel loss function related to class labels of acoustic scenes. Our experiments show that despite the small-scale setting, supervised feature learning is favorably competitive compared to the current state-of-the-art features. We also point out that for smaller scale dataset, supervised NMF is indeed slightly less prone to overfitting than convolutional neural networks. While the performances of these learned features are interesting per se, a deeper analysis of their behavior in the acoustic scene problem context raises open and difficult questions that we believe, need to be addressed for further performance breakthroughs.
Fichier principal
Vignette du fichier
casadeep.pdf (1.12 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01354115 , version 1 (17-08-2016)

Identifiants

  • HAL Id : hal-01354115 , version 1

Citer

A Rakotomamonjy. Supervised Representation Learning for Audio Scene Classification. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2017. ⟨hal-01354115⟩
163 Consultations
767 Téléchargements

Partager

Gmail Facebook X LinkedIn More