Supervised Representation Learning for Audio Scene Classification

A Rakotomamonjy

Article Dans Une Revue IEEE/ACM Transactions on Audio, Speech and Language Processing Année : 2017

Supervised Representation Learning for Audio Scene Classification

(1)

A Rakotomamonjy

Fonction : Auteur
PersonId : 174806
IdHAL : arakotomamonjy
ORCID : 0000-0002-4210-7792
IdRef : 083002138

Equipe Apprentissage

Résumé

This paper investigates the use of supervised feature learning approaches for extracting relevant and discriminative features from acoustic scene recordings. Owing to the recent release of open datasets for acoustic scene classification (ASC) problems, representation learning techniques can now be envisioned for solving the problem of feature extraction. This paper makes a step towards this goal by first studying models based on convolutional neural networks (ConvNet). Because the scale of the datasets available is still small compared to those available in computer vision, we also introduce a technical contribution denoted as supervised non-negative matrix factorization (SNMF). Our goal through this SNMF is to induce the matrix decomposition to carry out discriminative information in addition to the usual generative ones. We achieve this objective by augmenting the NMF optimization problem with a novel loss function related to class labels of acoustic scenes. Our experiments show that despite the small-scale setting, supervised feature learning is favorably competitive compared to the current state-of-the-art features. We also point out that for smaller scale dataset, supervised NMF is indeed slightly less prone to overfitting than convolutional neural networks. While the performances of these learned features are interesting per se, a deeper analysis of their behavior in the acoustic scene problem context raises open and difficult questions that we believe, need to be addressed for further performance breakthroughs.

Mots clés

feature learning time-frequency representation audio scene classification non-negative matrix factorization convolutional neural networks

Domaines

Informatique Réseau de neurones [cs.NE] Apprentissage [cs.LG] Son [cs.SD]

Fichier principal

casadeep.pdf (1.12 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Alain Rakotomamonjy : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01354115

Soumis le : mercredi 17 août 2016-13:58:26

Dernière modification le : vendredi 22 décembre 2023-15:16:05

Archivage à long terme le : vendredi 18 novembre 2016-11:43:22

Dates et versions

hal-01354115 , version 1 (17-08-2016)

Identifiants

HAL Id : hal-01354115 , version 1

Citer

A Rakotomamonjy. Supervised Representation Learning for Audio Scene Classification. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2017. ⟨hal-01354115⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSA-ROUEN LITIS COMUE-NORMANDIE UNIROUEN UNILEHAVRE INSA-GROUPE

163 Consultations

767 Téléchargements

Supervised Representation Learning for Audio Scene Classification

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager