Histogram of gradients of Time-Frequency Representations for Audio scene detection - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue IEEE/ACM Transactions on Audio, Speech and Language Processing Année : 2015

Histogram of gradients of Time-Frequency Representations for Audio scene detection

Alain Rakotomamonjy

Résumé

This paper addresses the problem of audio scenes classification and contributes to the state of the art by proposing a novel feature. We build this feature by considering histogram of gradients (HOG) of time-frequency representation of an audio scene. Contrarily to classical audio features like MFCC, we make the hypothesis that histogram of gradients are able to encode some relevant informations in a time-frequency {representation:} namely, the local direction of variation (in time and frequency) of the signal spectral power. In addition, in order to gain more invariance and robustness, histogram of gradients are locally pooled. We have evaluated the relevance of {the novel feature} by comparing its performances with state-of-the-art competitors, on several datasets, including a novel one that we provide, as part of our contribution. This dataset, that we make publicly available, involves $19$ classes and contains about $900$ minutes of audio scene recording. We thus believe that it may be the next standard dataset for evaluating audio scene classification algorithms. Our comparison results clearly show that our HOG-based features outperform its competitors
Fichier principal
Vignette du fichier
hogforcasa.pdf (5 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-00951990 , version 1 (26-02-2014)
hal-00951990 , version 2 (06-03-2014)
hal-00951990 , version 3 (29-08-2014)
hal-00951990 , version 4 (11-06-2015)
hal-00951990 , version 5 (02-07-2015)
hal-00951990 , version 6 (03-08-2015)

Identifiants

Citer

Alain Rakotomamonjy, Gilles Gasso. Histogram of gradients of Time-Frequency Representations for Audio scene detection. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2015, 23 (1), pp.142-153. ⟨10.1109/TASLP.2014.2375575⟩. ⟨hal-00951990v6⟩
297 Consultations
1233 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More