Feature Learning with Matrix Factorization Applied to Acoustic Scene Classification

In this paper, we study the usefulness of various matrix factorization methods for learning features to be used for the specific Acoustic Scene Classification problem. A common way of addressing ASC has been to engineer features capable of capturing the specificities of acoustic environments. Instead, we show that better representations of the scenes can be automatically learned from time-frequency representations using matrix factorization techniques. We mainly focus on extensions including sparse, kernel-based, convolutive and a novel supervised dictionary learning variant of Principal Component Analysis and Nonnegative Matrix Factorization. An experimental evaluation is performed on two of the largest ASC datasets available in order to compare and discuss the usefulness of these methods for the task. We show that the unsupervised learning methods provide better representations of acoustic scenes than the best conventional hand-crafted features on both datasets. Furthermore, the introduction of a novel nonnegative supervised matrix factorization model and Deep Neural networks trained on spectrograms, allow us to reach further improvements.

Mots clés

Acoustic Scene Classification Feature learning Matrix Factorization

Domaines

Apprentissage [cs.LG] Son [cs.SD]

Fichier principal

bisot2017.pdf (631.05 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Victor Bisot : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01362864

Soumis le : jeudi 24 août 2017-09:30:17

Dernière modification le : lundi 9 octobre 2023-12:49:39

Dates et versions

hal-01362864 , version 1 (09-09-2016)

hal-01362864 , version 2 (24-08-2017)

Identifiants

HAL Id : hal-01362864 , version 2
DOI : 10.1109/TASLP.2017.2690570

Citer

Victor Bisot, Romain Serizel, Slim Essid, Gael Richard. Feature Learning with Matrix Factorization Applied to Acoustic Scene Classification. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2017, 25 (6), pp.1216 - 1229. ⟨10.1109/TASLP.2017.2690570⟩. ⟨hal-01362864v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM CNRS INRIA PARISTECH UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD LTCI IDS S2A

748 Consultations

1064 Téléchargements